This section discusses the differences between the three types of parallel processing analyses available (shared-memory parallel (SMP), distributed-memory parallel (DMP), and hybrid) and how to specify values that determine the total cores used in each one. The main difference between SMP and DMP is that SMP shares the memory between threads in a single process while DMP distributes the memory across several processes running on different cores of a single machine or multiple machines. Hybrid parallel processing combines SMP and DMP.
You can specify the following values that determine the total core count for your analysis:
the number of processes for DMP
the number of threads per process for SMP
both the number of processes and the number of threads per process for hybrid parallel
The rest of this section details how these values determine the total number of cores for the different types of parallel processing.
Figure 1.1: Schematic of Processes and Threads for DMP and Hybrid Parallel Processing shows a diagram comparing DMP and hybrid parallel processing. In distributed-memory parallel processing, only the master process uses multiple SMP threads while executing preprocessing and postprocessing tasks, and the worker processes are limited to one thread per process while executing solution tasks in parallel. Instead, hybrid parallel enables multiple threads per process in all processes, master and worker, during solution. While not diagrammed, SMP shares the memory between threads in a single process and can be visualized as the master process alone without the worker processes in the figure.
Table 1.1: Command Line Options to Specify Number of Processes and/or Threads per Process summarizes the command line options used to specify the type of parallel computing (SMP, DMP, or hybrid), the number of processes, and/or the number of threads per process. The following equation clarifies how these values relate to the total core count. For the definitions of these and other terms used in parallel processing, see Parallel Processing Terminology.
(1–1) |
where:
is the number of processes, |
is the number of threads per process, and |
is the total core count for the analysis. |
Note: To avoid oversubscribing the hardware, the number of threads per core is always one. Also, the total cores prescribed must be equal to or less than actual available cores.
Since the number of processes for SMP is one by definition, the core counts for SMP by
Equation 1–1 is simply equal to the number of threads per
process,, which you can specify by command line option
-np
.
As diagrammed in Figure 1.1: Schematic of Processes and Threads for DMP and Hybrid Parallel
Processing, DMP uses only one thread per process
during solution. Therefore, by Equation 1–1, core counts for DMP
is simply equal to the number of processes, , which you can specify by command line option -np
.
Table 1.1: Command Line Options to Specify Number of Processes and/or Threads per Process
parallel processing type |
number of processes
( in Equation 1–1) |
threads per process ( in Equation 1–1) |
Shared-Memory Parallel (SMP)[a] | 1 | specified by command line option -np
(default is 4) |
Distributed-Memory Parallel (DMP)[b] | specified by command line option -np
(default is 4) | 1 [c] |
Hybrid Parallel[d] | specified by command line option -np
(default is 4) | specified by command line option -nt
[d]
(default is 1) |
[a] Specify SMP by command line option
-smp
.
[b] Since DMP is the default parallel processing type, the
-dis
line option for DMP is not
required.
[c] Since 1 is the default, it does not need to be set unless
you want to turn off auto-hybrid. To turn off auto-hybrid logic and
force a DMP run, issue the command line option
-nt
1.
[d] To invoke hybrid parallel, set -nt
to a
number greater than its default value of 1.
Note: The meaning of the command line option -np
is different for
SMP compared to DMP and hybrid as listed in Table 1.1: Command Line Options to Specify Number of Processes and/or Threads per
Process:
For SMP,
-np
specifies threads/process.For DMP and hybrid,
-np
specifies the number of processes.
To summarize how the total core count is determined for the different parallel processing types:
For SMP and DMP, the total core count is equal to the number entered after the
-np
command line option, which is the number of threads/process for SMP and number of processes for DMP.For hybrid, the total core count is , or the product of the number of processes (that you can specify by command line option
-np
) multiplied by the number of threads/process (specified by command line option-nt
). For example, the following command:ansys241 -b -np 4 -nt 4 <test.dat> out
specifies a hybrid parallel run using 4 MPI processes with 4 threads/process, and the total core count is 16.
Total cores prescribed must be equal to or less than actual available cores
In all cases (SMP, DMP, and hybrid), the parallel processing run is limited by the actual number of physical cores available[1]. If the number of cores prescribed exceeds the number of physical cores available[1], the number of threads/processes will be automatically reduced for SMP, DMP, and hybrid so that the total cores calculated by Equation 1–1 do not exceed the physical CPU cores available.
If the specifics of your analysis indicate that using hybrid parallel will improve performance, the program will automatically invoke hybrid parallel. The decision to activate auto-hybrid is made according to a set of heuristics at the beginning of the first load step solution, just before the domain decomposition step is performed. If auto-hybrid is invoked:
The program automatically sets values for
-nt
and-np
, keeping the same number for the total cores you prescribed for your analysis, and the solution runs in hybrid parallel.An output message will report that hybrid parallel has been automatically invoked with the values used for the number of processes (
-np
) and threads per process (-nt
).
To Turn off Auto-hybrid — If you specify a value for the number of threads/process by issuing the
-nt
command line option, the auto-hybrid feature is shut
off. For example, if you are sure that you want an analysis to run using pure
DMP with only one thread/process during solution, you can turn off auto-hybrid
via the command line option to specify one thread per process:
-nt
1.
[1] Note that additional licenses are required to run a parallel processing solution with more than four cores in all cases (SMP, DMP, and hybrid). See HPC Licensing for details.