6.9. Solver Performance Guidelines

Performance guideline are provided for the following solvers:

6.9.1. Sparse Direct Solver

In order to optimize sparse solver performance, you should know the expected memory requirements for each memory mode and compare them to the physical memory of the computer system. The in-core memory mode is the ideal mode to use. However, it is typically better to use the out-of-core mode with enough memory allocated to easily run in this mode than to run in the in-core memory mode and use up all (or nearly all) of the physical memory on the system.

Be aware that the out-of-core memory mode causes a significant degradation in overall solver performance and scalability, and is therefore rarely recommended. Some exceptions do exist, such as solving a large model on a fixed set of hardware resources, or running on systems with very fast I/O configurations. However, the recommendation for these cases is to always run on workstations/servers with more RAM, or to consider using clusters (if possible) to spread the job over additional nodes so that the in-core memory mode is used.

You can force the sparse solver to run in-core by issuing the command BCSOPTION,,INCORE (or DSPOPTION,,INCORE for the distributed-memory sparse solver). This command directs the solver to allocate whatever is necessary to run in the in-core memory mode. Once the in-core memory requirement is known for a given model, it is possible to launch Mechanical APDL with enough initial memory that the sparse solver runs in-core automatically. This trial and error approach is helpful in conserving memory, but is unnecessary if the memory available is sufficient to easily run the given job.

A common error made by users on systems with large amounts of RAM is to start Mechanical APDL with a very large initial scratch memory allocation that is unnecessary. This initial allocation limits the amount of system memory left to function as a buffer to cache Mechanical APDL files in memory.

It is also common for users to increase the initial scratch memory allocation at startup, but just miss the requirement for running the sparse solver in the in-core memory mode. In this case, the sparse solver still runs out-of-core, but often at a reduced performance level because less memory is available for the system buffer cache.

In both of the above scenarios, using default initial scratch memory allocations would be more beneficial. In-core solver performance is not required on these types of systems to obtain good results, but it is a valuable option for time-critical runs when you can dedicate a large memory system to a single large model.

Table 6.2: Summary of Sparse Direct Solver Guidelines

Memory Mode	Guideline
In-core	Always target this memory mode for optimal performance. Use only if the in-core memory requirement is < 90% of the total physical memory of the system ("comfortable" in-core memory).
Out-of-core	Avoid this mode as much as possible. Consider switching to SSDs and/or adding a RAID0 array of disks to improve I/O performance. For Block Lanczos, consider increasing the block size (MODOPT command) when poor I/O performance is seen in order to reduce the number of block solves (that is, reduce the amount of I/O done).
General Guidelines: Use parallel performance to reduce factorization time, but ensure that the parallel speedup is not limited by serial I/O. Don't use excessive memory for out-of-core runs (this limits system caching of I/O to files). Don't use all of the available physical memory just to get an in-core factorization (this results in sluggish system performance and limits system caching of all other files).

6.9.2. PCG Iterative Solver

The preconditioner option (Lev_Diff value on the PCGOPT command) is an important factor in optimizing PCG solver performance. This option affects the number of iterations required to reach convergence as well as the cost (time) required for each iteration. The optimal Lev_Diff value changes based on the particular model and hardware involved.

The following table provides general guidelines for choosing the optimal settings for the PCG solver. These are general guidelines and not hard rules to follow. When seeking optimal performance, it is best to try various solver options to find the optimal combination for the given model/hardware used for the simulation.

Table 6.3: Summary of PCG Iterative Solver Guidelines

`Lev_Diff` Value	Guideline
1	Optimal number of iterations is < 1000. When this threshold is exceeded, try higher `Lev_Diff` values.
2	Optimal number of iterations is < 800. When this threshold is exceeded, try higher `Lev_Diff` values or switch to the sparse solver.
3	Optimal number of iterations is < 600. When this threshold is exceeded, try higher `Lev_Diff` values or switch to the sparse solver.
4	Optimal number of iterations is < 400. When this threshold is exceeded, try higher `Lev_Diff` values or switch to the sparse solver.
General Guidelines: Consider this solver, when applicable, for larger models for which the sparse direct solver would have to run in the out-of-core memory mode. Use parallel performance to reduce solver time. For higher core counts, try lowering the `Lev_Diff` value for optimal performance. Check the number of iterations and the `Lev_Diff` value used. Above guidelines depend on many factors. Consider enabling MSAVE (when it applies) for extremely large models.