6.7. Supernode Solver Performance Output

The Supernode eigensolver works by taking the matrix structure of the stiffness and mass matrix from the original FEA model and internally breaking it into pieces (supernodes). These supernodes are then used to reduce the original FEA matrix to a much smaller system of equations. The solver then computes all of the modes and mode shapes within the requested frequency range on this smaller system of equations. Then, the supernodes are used again to transform (or expand) the smaller mode shapes back to the larger problem size from the original FEA model.

The power of this eigensolver is best experienced when solving for a high number of modes, usually more than 200 modes. Another benefit is that this eigensolver typically performs much less I/O than the Block Lanczos eigensolver and, therefore, is especially useful on typical desktop machines that often have limited disk space and/or slow I/O transfer speeds.

Performance information for the Supernode Eigensolver is printed by default to the file Jobname.DSP. Use the command SNOPTION,,,,,,PERFORMANCE to print this same information, along with additional solver information, to the standard output file.

When studying performance of this eigensolver, each of the three key steps described above (reduction, solution, expansion) should be examined. The first step of forming the supernodes and performing the reduction increases in time as the original problem size increases; however, it typically takes about the same amount of computational time whether 10 modes or 1000 modes are requested. In general, the larger the original problem size is relative to the number of modes requested, the larger the percentage of solution time spent in this reduction process.

The next step, solving the reduced eigenvalue problem, increases in time as the number of modes in the specified frequency range increases. This step is typically a much smaller portion of the overall solver time and, thus, does not often have a big effect on the total time to solution. However, this depends on the size of the original problem relative to the number of requested modes. The larger the original problem, or the fewer the requested modes within the specified frequency range, the smaller will be the percentage of solution time spent in this step. Choosing a frequency range that covers only the range of frequencies of interest will help this step to be as efficient as possible.

The final step of expanding the mode shapes can be an I/O intensive step and, therefore, typically warrants the most attention when studying the performance of the Supernode eigensolver. The solver expands the final modes using a block of vectors at a time. This block size is a controllable parameter and can help reduce the amount of I/O done by the solver, but at the cost of more memory usage.

Example 6.6: DSP File for Supernode (SNODE) Solver shows a part of the Jobname.DSP file that reports the performance statistics described above for a 1.5 million DOF modal analysis that computes 1000 modes and mode shapes. The output shows the cost required to perform the reduction steps (A), the cost to solve the reduced eigenvalue problem (B), and the cost to expand and output the final mode shapes to the Jobname.rst and Jobname.mode files (C). The block size for the expansion step is also shown (D). At the bottom of this file, the total size of the files written by this solver is printed with the total amount of I/O transferred.

In this example, the reduction time is by far the most expensive piece. The expansion computations are running at over 3000 Mflops, and this step is not relatively time consuming. In this case, using more than one core would certainly help to significantly reduce the time to solution.

Example 6.6: DSP File for Supernode (SNODE) Solver

     number of equations                     =         1495308
     no. of nonzeroes in lower triangle of a =        41082846
     no. of nonzeroes in the factor l        =       873894946
     ratio of nonzeroes in factor (min/max)  =          1.0000
     number of super nodes                   =            6554
     maximum order of a front matrix         =            6324
     maximum size of a front matrix          =        19999650
     maximum size of a front trapezoid       =        14547051
     no. of floating point ops for eigen sol =      8.0502D+12
     no. of floating point ops for eigen out =      1.2235D+12
     no. of equations in global eigenproblem =            5983
     factorization panel size                =             256
     supernode eigensolver block size        =              40 <---D
     number of cores used                    =               1
     time (cpu & wall) for structure input   =        1.520000        1.531496
     time (cpu & wall) for ordering          =        4.900000        4.884938
     time (cpu & wall) for value input       =        1.050000        1.045083
     time (cpu & wall) for matrix distrib.   =        3.610000        3.604622
     time (cpu & wall) for eigen solution    =     1888.840000     1949.006305
     computational rate (mflops) for eig sol =     4261.983276     4130.414801
     effective I/O rate (MB/sec) for eig sol =                      911.661409
     time (cpu & wall) for eigen output      =      377.760000      377.254216
     computational rate (mflops) for eig out =     3238.822665     3243.164948
     effective I/O rate (MB/sec) for eig out =                     1175.684097

     cost (elapsed time) for SNODE eigenanalysis
     -----------------------------
     Substructure eigenvalue cost            =      432.940317 <---A (Reduction)
     Constraint mode & Schur complement cost =      248.458204 <---A (Reduction)
     Guyan reduction cost                    =      376.112464 <---A (Reduction)
     Mass update cost                        =      715.075189 <---A (Reduction)
     Global eigenvalue cost                  =      146.099520 <---B (Reduced Problem)
     Output eigenvalue cost                  =      377.254077 <---C (Expansion)

     i/o stats: unit-Core          file length             amount transferred
                                 words       mbytes          words       mbytes
                   ----     ----------     --------     ----------     --------
                17-   0     628363831.     4794. MB   16012767174.   122168. MB
                45-   0     123944224.      946. MB    1394788314.    10641. MB
                46-   0      35137755.      268. MB      70275407.      536. MB
                93-   0      22315008.      170. MB     467938080.     3570. MB
                94-   0      52297728.      399. MB     104582604.      798. MB
                98-   0      22315008.      170. MB     467938080.     3570. MB
                99-   0      52297728.      399. MB     104582604.      798. MB

                -------     ----------     --------     ----------     --------
                Totals:     936671282.     7146. MB   18622872263.   142081. MB


  Total Memory allocated =     561.714 MB