6.6. PCG Lanczos Solver Performance Output

The PCG Lanczos eigensolver uses the Lanczos algorithm to compute eigenvalues and eigenvectors (frequencies and mode shapes) for modal analyses, but replaces matrix factorization and multiple solves with multiple iterative solves. In other words, it replaces a direct sparse solver with the iterative PCG solver while keeping the same Lanczos algorithm. An iterative solution is faster than matrix factorization, but usually takes longer than a single block solve. The real power of the PCG Lanczos method is experienced for very large models, usually above a few million DOFs, where matrix factorization and solves become very expensive.

The PCG Lanczos method will automatically choose an appropriate default level of difficulty, but experienced users may improve solution time by manually specifying the level of difficulty via the PCGOPT command. Each successive increase in level of difficulty (Lev_Diff value on PCGOPT command) increases the cost per iteration, but also reduces the total iterations required.

The performance summary for PCG Lanczos is contained in the file Jobname.PCS, with additional information related to the Lanczos solver. The first part of the .PCS file contains information specific to the modal analysis, including the computed eigenvalues and frequencies. The second half of the .PCS file contains similar performance data as found in a static or transient analysis. As highlighted in the next two examples, the important details in this file are the number of load cases (A), total iterations in PCG (B), level of difficulty (C), and the total elapsed time (D).

The number of load cases corresponds to the number of Lanczos steps required to obtain the specified number of eigenvalues. It is usually 2 to 3 times more than the number of eigenvalues desired, unless the Lanczos algorithm has difficulty converging. PCG Lanczos will be increasingly expensive relative to Block Lanczos as the number of desired eigenvalues increases. PCG Lanczos is best for obtaining a relatively small number of modes (up to 100) for large models (over a few million DOF).

The next example shows a part of the PCS file that reports performance statistics described above for a 2 million DOF modal analysis that computes 10 modes. In this example, the level of difficulty used (Lev_Diff on the PCGOPT command) is 3. The output in Example 6.4: PCS File for PCG Lanczos, Level of Difficulty = 3 shows that Lev_Diff = 3 (C), and the total iterations required for 25 Lanczos steps (A) is 2355 (B), or an average of 94.2 iterations per step (E).

This example shows a model that performed quite well with the PCG Lanczos eigensolver. Considering that it converged in under 100 iterations per load case, the Lev_Diff value of 3 is probably too high for this model (especially at higher core counts). In this case, it might be worthwhile to try Lev_Diff = 1 or 2 to see if it improves the solver performance. Using more than one core would also certainly help to reduce the time to solution.

Example 6.4: PCS File for PCG Lanczos, Level of Difficulty = 3

Lanczos Solver Parameters
-------------------------
        Lanczos Block Size: 1
        Eigenpairs computed: 10 lowest
        Extra Eigenpairs: 1
        Lumped Mass Flag: 0
        In Memory Flag: 0
        Extra Krylov Dimension: 1
        Mass Matrix Singular Flag: 1
        PCCG Stopping Criteria Selector: 4
        PCCG Stopping Threshold: 1.000000e-04
        Extreme Preconditioner Flag: 0
        Reortho Type: 1
        Number of Reorthogonalizations: 7
        nExtraWorkVecs for computing eigenvectors: 4
        Rel. Eigenvalue Tolerance: 1.000000e-08
        Rel. Eigenvalue Residual Tolerance: 1.000000e-11
        Restart Condition Number Threshold: 1.000000e+15
        Sturm Check Flag: 0

        Shifts Applied:   1.017608e-01

Eigenpairs

Number of Eigenpairs 10

---------------------------------------

No.     Eigenvalue      Frequency(Hz)
---     ----------      -------------
1       1.643988e+03    6.453115e+00
2       3.715504e+04    3.067814e+01
3       5.995562e+04    3.897042e+01
4       9.327626e+04    4.860777e+01
5       4.256303e+05    1.038332e+02
6       7.906460e+05    1.415178e+02
7       9.851501e+05    1.579688e+02
8       1.346627e+06    1.846902e+02
9       1.656628e+06    2.048484e+02
10      2.050199e+06    2.278863e+02



        Number of cores used: 1
        Degrees of Freedom: 2067051
        DOF Constraints: 6171
        Elements: 156736
                Assembled: 156736
                Implicit: 0
        Nodes: 689017
        Number of Load Cases: 25 <---A (Lanczos Steps)

        Nonzeros in Upper Triangular part of
                     Global Stiffness Matrix : 170083104
        Nonzeros in Preconditioner: 201288750
                *** Precond Reorder: MLD ***
                Nonzeros in V: 12401085
                Nonzeros in factor: 184753563
                Equations in factor: 173336
        *** Level of Difficulty: 3 *** <---C (Preconditioner)

        Total Operation Count: 3.56161e+12
        Total Iterations In PCG: 2355 <---B (Convergence)
        Average Iterations Per Load Case:   94.2 <---E (Iterations per Step)
        Input PCG Error Tolerance: 0.0001
        Achieved PCG Error Tolerance: 9.98389e-05


        DETAILS OF PCG SOLVER SETUP TIME(secs)        Cpu        Wall
             Gather Finite Element Data              0.40        0.40
             Element Matrix Assembly                96.24       96.52

        DETAILS OF PCG SOLVER SOLUTION TIME(secs)     Cpu        Wall
             Preconditioner Construction             1.74        1.74
             Preconditioner Factoring               51.69       51.73
             Apply Boundary Conditions               5.03        5.03
             Eigen Solve                          3379.36     3377.52
                  Eigen Solve Overhead             172.49      172.39
                       Compute MQ                  154.00      154.11
                       Reorthogonalization         123.71      123.67
                            Computation            120.66      120.61
                            I/O                      3.05        3.06
                       Block Tridiag Eigen           0.00        0.00
                       Compute Eigenpairs            1.63        1.63
                       Output Eigenpairs             0.64        0.64
                  Multiply With A                 1912.52     1911.41
                       Multiply With A22          1912.52     1911.41
                  Solve With Precond              1185.62     1184.85
                       Solve With Bd                89.84       89.89
                       Multiply With V             192.81      192.53
                       Direct Solve                880.71      880.38
******************************************************************************
             TOTAL PCG SOLVER SOLUTION CP TIME      =    3449.01 secs
             TOTAL PCG SOLVER SOLUTION ELAPSED TIME =    3447.20 secs <---D (Total Time)
******************************************************************************
        Total Memory Usage at Lanczos    :    3719.16 MB
        PCG Memory Usage at Lanczos      :    2557.95 MB
        Memory Usage for Matrix          :       0.00 MB
******************************************************************************
        Multiply with A Memory Bandwidth :      15.52 GB/s
        Multiply with A MFLOP Rate       :     833.13 MFlops
        Solve With Precond MFLOP Rate    :    1630.67 MFlops
        Precond Factoring MFLOP Rate     :       0.00 MFlops
******************************************************************************
        Total amount of I/O read         :    6917.76 MB
        Total amount of I/O written      :    6732.46 MB
******************************************************************************