5.2. Types of Solvers

5.2.1. The Sparse Direct Solver

The sparse direct solver (including the Block Lanczos method for modal and buckling analyses) is based on a direct elimination of equations, as opposed to iterative solvers, where the solution is obtained through an iterative process that successively refines an initial guess to a solution that is within an acceptable tolerance of the exact solution. Direct elimination requires the factorization of an initial very sparse linear system of equations into a lower triangular matrix followed by forward and backward substitution using this triangular system. The space required for the lower triangular matrix factors is typically much more than the initial assembled sparse matrix, hence the large disk or in-core memory requirements for direct methods.

Sparse direct solvers seek to minimize the cost of factorizing the matrix as well as the size of the factor using sophisticated equation reordering strategies. Iterative solvers do not require a matrix factorization and typically iterate towards the solution using a series of very sparse matrix-vector multiplications along with a preconditioning step, both of which require less memory and time per iteration than direct factorization. However, convergence of iterative methods is not guaranteed and the number of iterations required to reach an acceptable solution may be so large that direct methods are faster in some cases.

Because the sparse direct solver is based on direct elimination, poorly conditioned matrices do not pose any difficulty in producing a solution (although accuracy may be compromised). Direct factorization methods always give an answer if the equation system is not singular. When the system is close to singular, the solver can usually give a solution (although you must verify the accuracy).

The sparse solver can run completely in memory (also known as in-core) if sufficient memory is available. The sparse solver can also run efficiently by using a balance of memory and disk usage (also known as out-of-core). The out-of-core mode typically requires about the same memory usage as the PCG solver (~1 GB per million DOFs) and requires a large disk file to store the factorized matrix (~10 GB per million DOFs). The amount of I/O required for a typical static analysis is three times the size of the matrix factorization. Running the solver factorization in-core (completely in memory) for modal/buckling runs can save significant amounts of wall (elapsed) time because modal/buckling analyses require several factorizations (typically 2 - 4) and repeated forward/backward substitutions (10 - 40+ block solves are typical). The same effect can often be seen with nonlinear or transient runs which also have repeated factor/solve steps.

The BCSOPTION command allows you to choose a memory strategy for the sparse solver. The available options for the Memory_Option field are DEFAULT, INCORE, OUTOFCORE, and FORCE. Depending on the availability of memory on the system, each memory strategy has its benefits. For systems with a large amount of physical memory, the INCORE memory mode often results in the best performance. Conversely, the OUTOFCORE memory mode often gives the worst solver performance and, therefore, is only recommended if necessary due to limited memory resources. In most cases you should use the DEFAULT memory mode. In this mode, the sparse solver uses sophisticated memory usage heuristics to balance available memory with the specific memory requirements of the sparse solver for each job. By default, most smaller jobs automatically run in the INCORE memory mode, but larger jobs may run in the INCORE memory mode or in the OUTOFCORE memory mode. In some cases you may want to explicitly set the sparse solver memory mode or memory allocation size using the BCSOPTION command. However, doing so is only recommended if you know how much physical memory is on the system and understand the sparse solver memory requirements for the job in question.

When the sparse solver is selected in a distributed-memory parallel solution, the distributed sparse direct solver is automatically used instead. The distributed sparse solver is mathematically identical to the shared-memory parallel sparse solver. It should typically be used for problems with which the PCG and JCG have convergence difficulty and on computer systems where large memory is available.

5.2.1.1. Distributed Sparse Direct Solver

The distributed sparse direct solver decomposes a large sparse matrix into smaller submatrices (instead of decomposing element domains), and then sends these submatrices to multiple cores on either shared-memory (for example, server) or distributed-memory (for example, cluster) hardware. Depending on the number of cores used, HPC licenses may be required. (For more information, see HPC Licensing in the Parallel Processing Guide.)

During the matrix factorization phase, each distributed process factorizes its submatrices simultaneously and communicates the information as necessary. The submatrices are automatically split into pieces (or fronts) by the solver during the factorization step. The non-distributed sparse solver works on one front at a time, while the distributed sparse solver works on n fronts at the same time (where n is the total number of processes used). Each front in the distributed sparse solver is stored in-core by each process while it is factored, even while the distributed sparse solver is running in out-of-core mode. This is essentially equivalent to the out-of-core mode for the non-distributed sparse solver. Therefore, the total memory usage of the distributed sparse solver when using the out-of-core memory mode is about n times the memory that is needed to hold the largest front. as more cores are used the total memory used by the solver (summed across all processes) actually increases when running in this memory mode.

The DSPOPTION command allows you to choose a specific memory strategy for the distributed sparse solver. The available options for the Memory_Option field are DEFAULT, INCORE, OUTOFCORE, and FORCE. Sophisticated memory usage heuristics, similar to those used by the sparse solver, are used to balance the specific memory requirements of the distributed sparse solver with the available memory on the machine(s) being used. By default, most smaller jobs run in the INCORE memory mode, while larger jobs can run either in the INCORE memory mode or in the OUTOFCORE memory mode. In some cases, you may want to explicitly set the memory mode using the DSPOPTION command. However, this is only recommended if you fully understand the solver memory used on each machine and the available memory for each machine.

When the distributed sparse solver runs in the out-of-core memory mode, it does substantial I/O to the disk storage device on the machine. If multiple solver processes write to the same disk, the performance of the solver decreases as more solver processes are used, meaning the total elapsed time of the solver does not decrease as much as expected. The ideal configuration for the distributed sparse solver when running in out-of-core mode is to run using a single process on each machine in a cluster or network, spreading the I/O across the hard drives of each machine, assuming that a high-speed network such as Infiniband is being used. Running the distributed sparse solver in out-of-core mode on a shared disk resource (for example, NAS or SAN disk) is typically not recommended. You can effectively run the distributed sparse solver using multiple processes with one drive (or a shared disk resource) if:

The problem size is small enough relative to the physical memory on the system that the system buffer cache can hold all of the distributed sparse solver (I/O) files and other files in memory.
You have a very fast hard drive configuration that can handle multiple I/O requests simultaneously. For a shared disk resource on a cluster, a very fast interconnect is also needed to handle the I/O traffic along with the regular communication of data within the solver.
You use the DSPOPTION,,INCORE command to force the distributed sparse solver into an in-core mode.

5.2.2. The Preconditioned Conjugate Gradient (PCG) Solver

The PCG solver starts with element matrix formulation. Instead of factoring the global matrix, the PCG solver assembles the full global stiffness matrix and calculates the DOF solution by iterating to convergence (starting with an initial guess solution for all DOFs). The PCG solver uses a proprietary preconditioner that is material property and element-dependent.

The PCG solver is usually about 4 to 10 times faster than the JCG solver for structural solid elements and about 10 times faster than JCG for shell elements. Savings increase with the problem size.
The PCG solver usually requires approximately twice as much memory as the JCG solver because it retains two matrices in memory:
- The preconditioner, which is almost the same size as the stiffness matrix
- The symmetric or unsymmetric, nonzero part of the stiffness matrix

You can use Table 5.1: Shared Memory Solver Selection Guidelines as a general guideline for memory usage.

This solver is available only for static or steady-state analyses, full transient, or full harmonic analyses, or for PCG Lanczos modal analyses. The PCG solver performs well on most static analyses and certain nonlinear analyses. It is valid for elements with definite or indefinite matrices. Contact analyses that use penalty-based or penalty and augmented Lagrangian-based methods work well with the PCG solver as long as contact does not generate rigid body motions throughout the nonlinear iterations (for example, full loss of contact).

The Lagrange multiplier method used by the following MPC184 element types can also be solved by the PCG solver: rigid beam, rigid link, slider, revolute joint, universal joint, translational joint, cylindrical joint, weld joint, spherical joint, and general joint. For all other MPC184 element types, the PCG solver cannot be used, and the sparse solver is required. Modal analysis using the PCG Lanczos mode-extraction method (MODOPT,LANPCG) is also supported for the MPC184 elements listed above. Linear perturbation analysis is not supported for the MPC184 elements.

The PCG solver cannot be used with Lagrange-formulation contact methods and incompressible u-P formulations. For these cases, the sparse solver must be used instead. For more information, see the PCGOPT command.

However, for all other MPC184 element types, Lagrange-formulation contact methods, and incompressible u-P formulations, the PCG solver cannot be used and the sparse solver is required.

Because they take fewer iterations to converge, well-conditioned models perform better than ill-conditioned models when using the PCG solver. Ill-conditioning often occurs in models containing elongated elements (that is, elements with high aspect ratios) or contact elements. To determine if your model is ill-conditioned, view the Jobname.pcs file to see the number of PCG iterations needed to reach a converged solution. Generally, static or full transient solutions that require more than 1500 PCG iterations are considered to be ill-conditioned for the PCG solver. For such models, the PCG solver may not be the most effective choice, and the program may automatically decide to switch to the sparse direct solver to provide a more efficient solution. Automatic switching is controlled via the Fallback option on the PCGOPT command.

For ill-conditioned models, the PCGOPT command can sometimes reduce solution times. You can adjust the level of difficulty (PCGOPT,Lev_Diff) depending on the amount of ill-conditioning in the model. By default, the program automatically adjusts the level of difficulty for the PCG solver based on the model. However, sometimes forcing a higher level of difficulty value for ill-conditioned models can reduce the overall solution time.

The PCG solver primarily solves for displacements/rotations (in structural analysis), temperatures (in thermal analysis), etc. The accuracy of other derived variables (such as strains, stresses, flux, etc.) is dependent upon accurate prediction of primary variables. Therefore, the program uses a very conservative setting for PCG tolerance (defaults to 1.0E-8) The primary solution accuracy is controlled by the PCG. For most applications, setting the PCG tolerance to 1.0E-6 provides a very accurate displacement solution and may save considerable CPU time compared with the default setting. Use the EQSLV command to change the PCG solver tolerance.

Direct solvers (such as the sparse direct solver) produce very accurate solutions. Iterative solvers, such as the PCG solver, require that a PCG convergence tolerance be specified. Therefore, a large relaxation of the default tolerance may significantly affect the accuracy, especially of derived quantities.

With all iterative solvers you must verify that the model is appropriately constrained. No minimum pivot is calculated, and the solver continues to iterate if any rigid body motion exists. Note that in this situation, if the PCG solver fails to converge, the program may decide to automatically switch to the sparse direct solver to improve the convergence behavior and provide a solution. If the model is truly underconstrained, even the sparse solver may fail with messages indicating there are one or more near-zero pivot terms during the matrix factorization step.

In a modal analysis using the PCG solver (MODOPT,LANPCG), the number of modes should be limited to 100 or less for efficiency. PCG Lanczos modal solutions can solve for a few hundred modes, but with less efficiency than Block Lanczos (MODOPT,LANB).

When the PCG solver encounters an indefinite matrix, the program may automatically switch to using the sparse direct solver in order to provide a more efficient solution. If this occurs during a nonlinear analysis, the program continues to use the sparse solver for the duration of the substep. At the completion of the current substep, the program typically reverts to the PCG solver (see the PCGOPT command for more details). If the fallback logic and automatic switching is disabled by setting FALLBACK = OFF on the PCGOPT command, then the solver invokes an algorithm that handles indefinite matrices. If the indefinite matrix algorithm also fails (this happens when the equation system is ill-conditioned; for example, losing contact at a substep or a plastic hinge development), the outer Newton-Raphson loop is triggered to perform a bisection. Normally, the stiffness matrix is better conditioned after bisection, and the PCG solver can eventually solve all the nonlinear steps.

The solution time grows linearly with problems size for iterative methods so huge models can still be solved within very reasonable times. For modal analyses of large models (for example, 10 million DOF or larger), MODOPT,LANPCG is a viable solution method if the number of modes is limited to approximately 100.

Use MSAVE,ON (the default in most cases) for memory savings of up to 70 percent. The MSAVE command uses an element-by-element approach (rather than globally assembling the stiffness matrix) for the parts of the structure involving SOLID185, SOLID186, or SOLID187 elements with linear material properties. This feature applies only to static analyses or modal analyses using the PCG Lanczos method (ANTYPE,STATIC; or ANTYPE,MODAL with MODOPT,LANPCG). Note that the MSAVE feature does not apply to any linear perturbation analysis types. The solution time may be affected depending on the hardware (processor speed, memory bandwidth, etc.), as well as the chosen element options.

Unsymmetric PCG Limitations and Tips

The unsymmetric PCG solver does not work for hyper-elastic materials (enhanced strain or U/P formulations) or highly nonlinear materials such as gaskets.
For nonlinear static buckling problems with bifurcation, tighten the PCG tolerance (EQSLV,,TOLER) below 1.E-8 to get the unique nonlinear path.
The PCG solver may fail to converge if there are rigid body motions present in the analysis, such as sliding contact problems without enough support. The unsymmetric PCG solver will automatically fall back to the SPARSE solver when necessary. In these cases, rigid body motions may be indicated by a SPARSE solver echo regarding small or near-zero pivots.
When used for an analysis with frictional contact, the unsymmetric PCG solver is efficient for friction coefficients of mu = 0.3 or less. For mu = 0.5 or near it, the PCG solver may require more iterations. With larger friction coefficients, the PCG solver may fail to converge and will fall back to the SPARSE solver. The rate of convergence in unsymmetric PCG is sensitive to the level of asymmetry in the assembled matrix. The higher the level of asymmetry in the matrix, the more PCG iterations required to converge.
The unsymmetric PCG solver does not support MSAVE,ON.
The unsymmetric PCG solver supports a level of difficultly (Lev_Diff) equal to 2-4 for structural analyses and only Lev_Diff = 1 for analyses with a single degree of freedom (for example, thermal analyses) in the PCGOPT command.

5.2.3. The Jacobi Conjugate Gradient (JCG) Solver

The JCG solver also starts with element matrix formulation. Instead of factoring the global matrix, the JCG solver assembles the full global stiffness matrix and calculates the DOF solution by iterating to convergence (starting with an initial guess solution for all DOFs). The JCG solver uses the diagonal of the stiffness matrix as a preconditioner. The JCG solver is typically used for thermal analyses and is best suited for 3D scalar field analyses that involve large, sparse matrices.

For some cases, the tolerance default value (set via the EQSLV,JCG command) of 1.0E-8 may be too restrictive, and may increase running time needlessly. The value 1.0E-5 may be acceptable in many situations.

The JCG solver is available only for static analyses, full harmonic analyses, or full transient analyses. (You specify these analysis types using the commands ANTYPE,STATIC, HROPT,FULL, or TRNOPT,FULL respectively.)

With all iterative solvers, be particularly careful to check that the model is appropriately constrained. No minimum pivot is calculated and the solver continues to iterate if any rigid body motion is possible.

5.2.4. The Incomplete Cholesky Conjugate Gradient (ICCG) Solver

The ICCG solver operates similarly to the JCG solver with the following exceptions:

The ICCG solver is more robust than the JCG solver for matrices that are not well-conditioned. Performance varies with matrix conditioning, but in general ICCG performance compares to that of the JCG solver.
The ICCG solver uses a more sophisticated preconditioner than the JCG solver. Therefore, the ICCG solver requires approximately twice as much memory as the JCG solver.

The ICCG solver is typically used for unsymmetric thermal analyses and electromagnetic analyses and is available only for static analyses, full harmonic analyses (HROPT,FULL), or full transient analyses (TRNOPT,FULL). (You specify the analysis type using the ANTYPE command.) The ICCG solver is useful for structural and multiphysics applications, and for symmetric, unsymmetric, complex, definite, and indefinite matrices.

5.2.5. The Quasi-Minimal Residual (QMR) Solver

The QMR solver is used for acoustic analyses and is available only for full harmonic analyses (HROPT,FULL). (You specify the analysis type using the ANTYPE command.) This solver is appropriate for symmetric, complex, definite, and indefinite matrices. The QMR solver is more robust than the ICCG solver.