6.1. Understanding Overall Performance

Finite element analyses often require significant hardware resources and may involve computations which take days or even weeks to complete. For optimal performance of a simulation, you should be aware of a few key performance statistics written into the output file which can give indications of optimal (or sub-optimal) performance.

The main consideration is whether your simulation was compute-bound or I/O bound. When seeking optimal performance, users should never spend a significant amount of time waiting for I/O to complete. To check for this issue, look for the following lines in the output file:

Total CPU time for main thread  :    167.8 seconds
. . . 
. . .
Elapsed Time (sec) =   388.000

When the elapsed time is significantly larger than the main thread CPU time, it typically indicates a lot of time was spent waiting for I/O to complete. In these situations, you could achieve far better performance by running this simulation on a system with more RAM, or by using a faster hard drive configuration (for example, consider using faster hard disk drives, multiple hard drives in a RAID0 configuration, or solid state drives). If the time spent waiting for I/O to complete was entirely eliminated, the elapsed time would roughly match the CPU time, giving some indication of how much speedup you might expect by running such a simulation on better hardware.

When the elapsed time is roughly equal to the main thread CPU time, it indicates your simulation is compute-bound. This means that you are achieving optimal performance for your chosen hardware. To speed up the simulation in this case, you could consider the following: running on a system with newer, faster processors; using distributed-memory parallelism instead of shared-memory parallelism; using more CPU cores or even a GPU to accelerate the simulation.


Note:  There are several situations where the elapsed time can be significantly less than the main thread CPU time, and that time is not spent waiting for I/O to complete. Some examples include using Microsoft MPI, the GPU acceleration feature, or waiting for the meshing service with the nonlinear adaptivity or SMART fracture feature. In these situations, the CPU may not be busy while the MPI messages are waiting to be received, the GPU is computing, or the meshing service is providing the requested mesh.