6.6. PCG Lanczos Solver Performance Output

The PCG Lanczos eigensolver uses the Lanczos algorithm to compute eigenvalues and eigenvectors (frequencies and mode shapes) for modal analyses, but replaces matrix factorization and multiple solves with multiple iterative solves. In other words, it replaces a direct sparse solver with the iterative PCG solver while keeping the same Lanczos algorithm. An iterative solution is faster than matrix factorization, but usually takes longer than a single block solve. The real power of the PCG Lanczos method is experienced for very large models, usually above a few million DOFs, where matrix factorization and solves become very expensive.

The PCG Lanczos method will automatically choose an appropriate default level of difficulty, but experienced users may improve solution time by manually specifying the level of difficulty via the PCGOPT command. Each successive increase in level of difficulty (Lev_Diff value on PCGOPT command) increases the cost per iteration, but also reduces the total iterations required. For Lev_Diff = 5, a direct matrix factorization is used so that the number of total iterations is the same as the number of load cases. This option is best for smaller problems where the memory required for factoring the given matrix is available, and the cost of factorization is not dominant.

The performance summary for PCG Lanczos is contained in the file Jobname.pcs, with additional information related to the Lanczos solver. The first part of the .pcs file contains information specific to the modal analysis, including the computed eigenvalues and frequencies. The second half of the .pcs file contains similar performance data as found in a static or transient analysis. As highlighted in the next two examples, the important details in this file are the number of load cases (A), total iterations in PCG (B), level of difficulty (C), and the total elapsed time (D).

The number of load cases corresponds to the number of Lanczos steps required to obtain the specified number of eigenvalues. It is usually 2 to 3 times more than the number of eigenvalues desired, unless the Lanczos algorithm has difficulty converging. PCG Lanczos will be increasingly expensive relative to Block Lanczos as the number of desired eigenvalues increases. PCG Lanczos is best for obtaining a relatively small number of modes (up to 100) for large models (over a few million DOF).

The next two examples show parts of the PCS file that report performance statistics described above for a 2 million DOF modal analysis that computes 10 modes. The difference between the two runs is the level of difficulty used (Lev_Diff on the PCGOPT command). Example 6.4: PCS File for PCG Lanczos, Level of Difficulty = 3 uses PCGOPT,3. The output shows that Lev_Diff = 3 (C), and the total iterations required for 25 Lanczos steps (A) is 2355 (B), or an average of 94.2 iterations per step (E). Example 6.5: PCS File for PCG Lanczos, Level of Difficulty = 5 shows that increasing Lev_Diff to 5 (C) on PCGOPT reduces the iterations required per Lanczos step to just one (E).

Though the solution time difference in these examples shows that a Lev_Diff value of 3 is faster in this case (see (D) in both examples), Lev_Diff = 5 can be much faster for more difficult models where the average number of iterations per load case is much higher. The average number of PCG iterations per load case for efficient PCG Lanczos solutions is generally around 100 to 200. If the number of PCG iterations per load case begins to exceed 500, then either the level of difficulty should be increased in order to find a more efficient solution, or it may be more efficient to use the Block Lanczos eigensolver (assuming the problem size does not exceed the limits of the system).

This example shows a model that performed quite well with the PCG Lanczos eigensolver. Considering that it converged in under 100 iterations per load case, the Lev_Diff value of 3 is probably too high for this model (especially at higher core counts). In this case, it might be worthwhile to try Lev_Diff = 1 or 2 to see if it improves the solver performance. Using more than one core would also certainly help to reduce the time to solution.

Example 6.4: PCS File for PCG Lanczos, Level of Difficulty = 3

Lanczos Solver Parameters
-------------------------
        Lanczos Block Size: 1
        Eigenpairs computed: 10 lowest
        Extra Eigenpairs: 1
        Lumped Mass Flag: 0
        In Memory Flag: 0
        Extra Krylov Dimension: 1
        Mass Matrix Singular Flag: 1
        PCCG Stopping Criteria Selector: 4
        PCCG Stopping Threshold: 1.000000e-04
        Extreme Preconditioner Flag: 0
        Reortho Type: 1
        Number of Reorthogonalizations: 7
        nExtraWorkVecs for computing eigenvectors: 4
        Rel. Eigenvalue Tolerance: 1.000000e-08
        Rel. Eigenvalue Residual Tolerance: 1.000000e-11
        Restart Condition Number Threshold: 1.000000e+15
        Sturm Check Flag: 0

        Shifts Applied:   1.017608e-01

Eigenpairs

Number of Eigenpairs 10

---------------------------------------

No.     Eigenvalue      Frequency(Hz)
---     ----------      -------------
1       1.643988e+03    6.453115e+00
2       3.715504e+04    3.067814e+01
3       5.995562e+04    3.897042e+01
4       9.327626e+04    4.860777e+01
5       4.256303e+05    1.038332e+02
6       7.906460e+05    1.415178e+02
7       9.851501e+05    1.579688e+02
8       1.346627e+06    1.846902e+02
9       1.656628e+06    2.048484e+02
10      2.050199e+06    2.278863e+02



        Number of cores used: 1
        Degrees of Freedom: 2067051
        DOF Constraints: 6171
        Elements: 156736
                Assembled: 156736
                Implicit: 0
        Nodes: 689017
        Number of Load Cases: 25 <---A (Lanczos Steps)

        Nonzeros in Upper Triangular part of
                     Global Stiffness Matrix : 170083104
        Nonzeros in Preconditioner: 201288750
                *** Precond Reorder: MLD ***
                Nonzeros in V: 12401085
                Nonzeros in factor: 184753563
                Equations in factor: 173336
        *** Level of Difficulty: 3   (internal 2) *** <---C (Preconditioner)

        Total Operation Count: 3.56161e+12
        Total Iterations In PCG: 2355 <---B (Convergence)
        Average Iterations Per Load Case:   94.2 <---E (Iterations per Step)
        Input PCG Error Tolerance: 0.0001
        Achieved PCG Error Tolerance: 9.98389e-05


        DETAILS OF PCG SOLVER SETUP TIME(secs)        Cpu        Wall
             Gather Finite Element Data              0.40        0.40
             Element Matrix Assembly                96.24       96.52

        DETAILS OF PCG SOLVER SOLUTION TIME(secs)     Cpu        Wall
             Preconditioner Construction             1.74        1.74
             Preconditioner Factoring               51.69       51.73
             Apply Boundary Conditions               5.03        5.03
             Eigen Solve                          3379.36     3377.52
                  Eigen Solve Overhead             172.49      172.39
                       Compute MQ                  154.00      154.11
                       Reorthogonalization         123.71      123.67
                            Computation            120.66      120.61
                            I/O                      3.05        3.06
                       Block Tridiag Eigen           0.00        0.00
                       Compute Eigenpairs            1.63        1.63
                       Output Eigenpairs             0.64        0.64
                  Multiply With A                 1912.52     1911.41
                       Multiply With A22          1912.52     1911.41
                  Solve With Precond              1185.62     1184.85
                       Solve With Bd                89.84       89.89
                       Multiply With V             192.81      192.53
                       Direct Solve                880.71      880.38
******************************************************************************
             TOTAL PCG SOLVER SOLUTION CP TIME      =    3449.01 secs
             TOTAL PCG SOLVER SOLUTION ELAPSED TIME =    3447.20 secs <---D (Total Time)
******************************************************************************
        Total Memory Usage at Lanczos    :    3719.16 MB
        PCG Memory Usage at Lanczos      :    2557.95 MB
        Memory Usage for Matrix          :       0.00 MB
******************************************************************************
        Multiply with A Memory Bandwidth :      15.52 GB/s
        Multiply with A MFLOP Rate       :     833.13 MFlops
        Solve With Precond MFLOP Rate    :    1630.67 MFlops
        Precond Factoring MFLOP Rate     :       0.00 MFlops
******************************************************************************
        Total amount of I/O read         :    6917.76 MB
        Total amount of I/O written      :    6732.46 MB
******************************************************************************

Example 6.5: PCS File for PCG Lanczos, Level of Difficulty = 5

Lanczos Solver Parameters
-------------------------
        Lanczos Block Size: 1
        Eigenpairs computed: 10 lowest
        Extra Eigenpairs: 1
        Lumped Mass Flag: 0
        In Memory Flag: 0
        Extra Krylov Dimension: 1
        Mass Matrix Singular Flag: 1
        PCCG Stopping Criteria Selector: 4
        PCCG Stopping Threshold: 1.000000e-04
        Extreme Preconditioner Flag: 1
        Reortho Type: 1
        Number of Reorthogonalizations: 7
        nExtraWorkVecs for computing eigenvectors: 4
        Rel. Eigenvalue Tolerance: 1.000000e-08
        Rel. Eigenvalue Residual Tolerance: 1.000000e-11
        Restart Condition Number Threshold: 1.000000e+15
        Sturm Check Flag: 0

        Shifts Applied:  -1.017608e-01

Eigenpairs

Number of Eigenpairs 10

---------------------------------------

No.     Eigenvalue      Frequency(Hz)
---     ----------      -------------
1       1.643988e+03    6.453116e+00
2       3.715494e+04    3.067810e+01
3       5.995560e+04    3.897041e+01
4       9.327476e+04    4.860738e+01
5       4.256265e+05    1.038328e+02
6       7.906554e+05    1.415187e+02
7       9.851531e+05    1.579690e+02
8       1.346626e+06    1.846901e+02
9       1.656620e+06    2.048479e+02
10      2.050184e+06    2.278854e+02





        Number of cores used: 1
        Degrees of Freedom: 2067051
        DOF Constraints: 6171
        Elements: 156736
                Assembled: 156736
                Implicit: 0
        Nodes: 689017
        Number of Load Cases: 25 <---A (Lanczos Steps)

        Nonzeros in Upper Triangular part of
                     Global Stiffness Matrix : 170083104
        Nonzeros in Preconditioner: 4168012731
                *** Precond Reorder: MLD ***
                Nonzeros in V: 0
                Nonzeros in factor: 4168012731
                Equations in factor: 2067051
        *** Level of Difficulty: 5   (internal 0) *** <---C (Preconditioner)

        Total Operation Count: 4.34378e+11
        Total Iterations In PCG: 25 <---B (Convergence)
        Average Iterations Per Load Case:    1.0 <---E (Iterations per Step)
        Input PCG Error Tolerance: 0.0001
        Achieved PCG Error Tolerance: 1e-10


        DETAILS OF PCG SOLVER SETUP TIME(secs)        Cpu        Wall
             Gather Finite Element Data              0.42        0.43
             Element Matrix Assembly               110.99      111.11

        DETAILS OF PCG SOLVER SOLUTION TIME(secs)     Cpu        Wall
             Preconditioner Construction            26.16       26.16
             Preconditioner Factoring             3245.98     3246.01
             Apply Boundary Conditions               5.08        5.08
             Eigen Solve                          1106.75     1106.83
                  Eigen Solve Overhead             198.14      198.15
                       Compute MQ                  161.40      161.28
                       Reorthogonalization         130.51      130.51
                            Computation            127.45      127.44
                            I/O                      3.06        3.07
                       Block Tridiag Eigen           0.00        0.00
                       Compute Eigenpairs            1.49        1.49
                       Output Eigenpairs             0.62        0.62
                  Multiply With A                    7.89        7.88
                       Multiply With A22             7.89        7.88
                  Solve With Precond                 0.00        0.00
                       Solve With Bd                 0.00        0.00
                       Multiply With V               0.00        0.00
                       Direct Solve                908.61      908.68
******************************************************************************
             TOTAL PCG SOLVER SOLUTION CP TIME      =    4395.84 secs
             TOTAL PCG SOLVER SOLUTION ELAPSED TIME =    4395.96 secs <---D (Total Time)
******************************************************************************
        Total Memory Usage at Lanczos    :    3622.87 MB
        PCG Memory Usage at Lanczos      :    2476.45 MB
        Memory Usage for Matrix          :       0.00 MB
******************************************************************************
        Multiply with A Memory Bandwidth :      39.94 GB/s
        Solve With Precond MFLOP Rate    :     458.69 MFlops
        Precond Factoring MFLOP Rate     :       0.00 MFlops
******************************************************************************
        Total amount of I/O read         :   11853.51 MB
        Total amount of I/O written      :    7812.11 MB
******************************************************************************