Hybrid parallel processing is a combination of distributed-memory parallel (DMP) and shared-memory
parallel (SMP). With hybrid parallel, all processes are run in parallel with the
number of processes specified by the -np
command line option, like DMP.
However, hybrid parallel differs from DMP because all processes (master and worker) in the
solution phase can have multiple threads per process, specified by the -nt
command line option.
Enabling multiple threads per process during solution, hybrid parallel offers the following performance improvements:
- Reduced memory usage
With hybrid parallel, all computations during solution (including stiffness generation, linear equation solving, and results calculations) are performed in parallel with the ability to use multiple threads per MPI process. Since using more SMP threads does not significantly increase memory use, hybrid parallel reduces memory usage by using less MPI processes per compute node compared to DMP.
- More effective use of available hardware
To run larger models on a fixed amount of memory with DMP, you would use less processes per compute node to increase memory and solution efficiency, but this can lead to underutilization of compute nodes in the cluster. Hybrid parallel addresses this problem to fully utilize available hardware and licenses.
- Improved scalability of large models with high element load balance ratios
Hybrid parallel can improve efficiency for large models with contact pairs that cannot be split, which reduce the overall scalability of the code at higher core counts. By using SMP to handle contact pairs, it can improve scaling for models with high element load balance ratios since it decomposes into less domains.
Depending on the details of an analysis, hybrid parallel can be faster or slower than DMP. The auto-hybrid feature has been designed based on a set of heuristics to automatically switch to hybrid parallel if the specifics of your analysis indicate it would improve performance.