The PCG Lanczos eigensolver uses the Lanczos algorithm to compute eigenvalues and eigenvectors (frequencies and mode shapes) for modal analyses, but replaces matrix factorization and multiple solves with multiple iterative solves. In other words, it replaces a direct sparse solver with the iterative PCG solver while keeping the same Lanczos algorithm. An iterative solution is faster than matrix factorization, but usually takes longer than a single block solve. The real power of the PCG Lanczos method is experienced for very large models, usually above a few million DOFs, where matrix factorization and solves become very expensive.
The PCG Lanczos method will automatically choose an appropriate default level of
difficulty, but experienced users may improve solution time by manually specifying the
level of difficulty via the PCGOPT command. Each successive increase
in level of difficulty (Lev_Diff
value on
PCGOPT command) increases the cost per iteration, but also
reduces the total iterations required.
The performance summary for PCG Lanczos is contained in the file Jobname.PCS, with additional information related to the Lanczos solver. The first part of the .PCS file contains information specific to the modal analysis, including the computed eigenvalues and frequencies. The second half of the .PCS file contains similar performance data as found in a static or transient analysis. As highlighted in the next two examples, the important details in this file are the number of load cases (A), total iterations in PCG (B), level of difficulty (C), and the total elapsed time (D).
The number of load cases corresponds to the number of Lanczos steps required to obtain the specified number of eigenvalues. It is usually 2 to 3 times more than the number of eigenvalues desired, unless the Lanczos algorithm has difficulty converging. PCG Lanczos will be increasingly expensive relative to Block Lanczos as the number of desired eigenvalues increases. PCG Lanczos is best for obtaining a relatively small number of modes (up to 100) for large models (over a few million DOF).
The next example shows a part of the PCS file that
reports performance statistics described above for a 2 million DOF modal analysis
that computes 10 modes. In this example, the level of difficulty used
(Lev_Diff
on the PCGOPT command)
is 3. The output in Example 6.4: PCS File for PCG Lanczos, Level of Difficulty = 3 shows that
Lev_Diff
= 3 (C), and the total iterations required
for 25 Lanczos steps (A) is 2355 (B), or an average of 94.2 iterations per step
(E).
This example shows a model that performed quite well with the PCG Lanczos eigensolver. Considering that it converged in under 100 iterations per load case, the Lev_Diff value of 3 is probably too high for this model (especially at higher core counts). In this case, it might be worthwhile to try Lev_Diff = 1 or 2 to see if it improves the solver performance. Using more than one core would also certainly help to reduce the time to solution.
Example 6.4: PCS File for PCG Lanczos, Level of Difficulty = 3
Lanczos Solver Parameters ------------------------- Lanczos Block Size: 1 Eigenpairs computed: 10 lowest Extra Eigenpairs: 1 Lumped Mass Flag: 0 In Memory Flag: 0 Extra Krylov Dimension: 1 Mass Matrix Singular Flag: 1 PCCG Stopping Criteria Selector: 4 PCCG Stopping Threshold: 1.000000e-04 Extreme Preconditioner Flag: 0 Reortho Type: 1 Number of Reorthogonalizations: 7 nExtraWorkVecs for computing eigenvectors: 4 Rel. Eigenvalue Tolerance: 1.000000e-08 Rel. Eigenvalue Residual Tolerance: 1.000000e-11 Restart Condition Number Threshold: 1.000000e+15 Sturm Check Flag: 0 Shifts Applied: 1.017608e-01 Eigenpairs Number of Eigenpairs 10 --------------------------------------- No. Eigenvalue Frequency(Hz) --- ---------- ------------- 1 1.643988e+03 6.453115e+00 2 3.715504e+04 3.067814e+01 3 5.995562e+04 3.897042e+01 4 9.327626e+04 4.860777e+01 5 4.256303e+05 1.038332e+02 6 7.906460e+05 1.415178e+02 7 9.851501e+05 1.579688e+02 8 1.346627e+06 1.846902e+02 9 1.656628e+06 2.048484e+02 10 2.050199e+06 2.278863e+02 Number of cores used: 1 Degrees of Freedom: 2067051 DOF Constraints: 6171 Elements: 156736 Assembled: 156736 Implicit: 0 Nodes: 689017 Number of Load Cases: 25 <---A (Lanczos Steps) Nonzeros in Upper Triangular part of Global Stiffness Matrix : 170083104 Nonzeros in Preconditioner: 201288750 *** Precond Reorder: MLD *** Nonzeros in V: 12401085 Nonzeros in factor: 184753563 Equations in factor: 173336 *** Level of Difficulty: 3 *** <---C (Preconditioner) Total Operation Count: 3.56161e+12 Total Iterations In PCG: 2355 <---B (Convergence) Average Iterations Per Load Case: 94.2 <---E (Iterations per Step) Input PCG Error Tolerance: 0.0001 Achieved PCG Error Tolerance: 9.98389e-05 DETAILS OF PCG SOLVER SETUP TIME(secs) Cpu Wall Gather Finite Element Data 0.40 0.40 Element Matrix Assembly 96.24 96.52 DETAILS OF PCG SOLVER SOLUTION TIME(secs) Cpu Wall Preconditioner Construction 1.74 1.74 Preconditioner Factoring 51.69 51.73 Apply Boundary Conditions 5.03 5.03 Eigen Solve 3379.36 3377.52 Eigen Solve Overhead 172.49 172.39 Compute MQ 154.00 154.11 Reorthogonalization 123.71 123.67 Computation 120.66 120.61 I/O 3.05 3.06 Block Tridiag Eigen 0.00 0.00 Compute Eigenpairs 1.63 1.63 Output Eigenpairs 0.64 0.64 Multiply With A 1912.52 1911.41 Multiply With A22 1912.52 1911.41 Solve With Precond 1185.62 1184.85 Solve With Bd 89.84 89.89 Multiply With V 192.81 192.53 Direct Solve 880.71 880.38 ****************************************************************************** TOTAL PCG SOLVER SOLUTION CP TIME = 3449.01 secs TOTAL PCG SOLVER SOLUTION ELAPSED TIME = 3447.20 secs <---D (Total Time) ****************************************************************************** Total Memory Usage at Lanczos : 3719.16 MB PCG Memory Usage at Lanczos : 2557.95 MB Memory Usage for Matrix : 0.00 MB ****************************************************************************** Multiply with A Memory Bandwidth : 15.52 GB/s Multiply with A MFLOP Rate : 833.13 MFlops Solve With Precond MFLOP Rate : 1630.67 MFlops Precond Factoring MFLOP Rate : 0.00 MFlops ****************************************************************************** Total amount of I/O read : 6917.76 MB Total amount of I/O written : 6732.46 MB ******************************************************************************