This section discusses the differences between the three types of parallel processing analyses available (shared-memory parallel (SMP), distributed-memory parallel (DMP), and hybrid) and how to specify values that determine the total cores used in each one. The main difference between SMP and DMP is that SMP shares the memory between threads in a single process while DMP distributes the memory across several processes running on different cores of a single machine or multiple machines. Hybrid parallel processing combines SMP and DMP.
You can specify the following values that determine the total core count for your analysis:
the number of processes for DMP
the number of threads per process for SMP
both the number of processes and the number of threads per process for hybrid parallel
The rest of this section details how these values determine the total number of cores for the different types of parallel processing.
Figure 1.1: Schematic of Processes and Threads for DMP and Hybrid Parallel Processing shows a diagram comparing DMP and hybrid parallel processing. In distributed-memory parallel processing, while executing preprocessing and postprocessing tasks, only the master process uses multiple SMP threads. The worker processes are limited to one thread per process while executing solution tasks in parallel. In contrast, hybrid parallel enables multiple threads per process in all processes, master and worker, during solution. While not diagrammed, SMP shares the memory between threads in a single process and can be visualized as the master process alone without the worker processes in the figure.
Table 1.1: Command Line Options to Specify Number of Processes and/or Threads per Process summarizes the command line options used to specify the type of parallel computing (SMP, DMP, or hybrid), the number of processes, and/or the number of threads per process. The following equation clarifies how these values relate to the total core count. For the definitions of these and other terms used in parallel processing, see Parallel Processing Terminology.
(1–1) |
where:
|
|
|
Note: To avoid oversubscribing the hardware, the number of threads per core is always one. Also, the total cores prescribed must be equal to or less than actual available cores.
Since the number of processes for SMP is one by definition, the core counts for SMP by
Equation 1–1 is simply equal to the number of threads per
process,, which you can specify by command line option
-np
.
As diagrammed in Figure 1.1: Schematic of Processes and Threads for DMP and Hybrid Parallel
Processing, DMP uses only one thread per process
during solution. Therefore, by Equation 1–1, core counts for DMP
is simply equal to the number of processes, , which you can specify by command line option
-np
.
Table 1.1: Command Line Options to Specify Number of Processes and/or Threads per Process
parallel processing type |
number of processes
( |
threads per process ( |
Shared-Memory Parallel (SMP)[a] | 1 | specified by command line option -np
(default is 4) |
Distributed-Memory Parallel (DMP)[b] | specified by command line option -np
(default is 4) | 1 [c] |
Hybrid Parallel[d] | specified by command line option -np
(default is 4) | specified by command line option -nt
[d]
(default is 1) |
[a] Specify SMP by command line option
-smp
.
[b] Since DMP is the default parallel processing type, the
-dis
line option for DMP is not
required.
[c] Since 1 is the default, it does not need to be set unless
you want to turn off auto-hybrid. To turn off auto-hybrid logic and
force a DMP run, issue the command line option
-nt
1.
[d] To invoke hybrid parallel, set -nt
to a
number greater than its default value of 1.
Note: The meaning of the command line option -np
is different for
SMP compared to DMP and hybrid as listed in Table 1.1: Command Line Options to Specify Number of Processes and/or Threads per
Process:
For SMP,
-np
specifies threads/process.For DMP and hybrid,
-np
specifies the number of processes.
To summarize how the total core count is determined for the different parallel processing types:
For SMP and DMP, the total core count is equal to the number entered after the
-np
command line option, which is the number of threads/process for SMP and number of processes for DMP.For hybrid, the total core count is
, or the product of the number of processes (that you can specify by command line option
-np
) multiplied by the number of threads/process (specified by command line option-nt
). For example, the following command:ansys251 -b -np 4 -nt 4 <test.dat> out
specifies a hybrid parallel run using 4 MPI processes with 4 threads/process, and the total core count is 16.
If hybrid parallel is invoked with the
-machines
command line option, the value specified for-nt
multiplies to all of the requested core counts on each machine. For example:ansys252 -b -nt 4 -machines machine1:8:machine2:8
results in 8 MPI processes with 4 processes per thread on each machine. Therefore, the total core count is 8 x 4 x 2 = 64.
Note that the number of threads per process is determined by the number of MPI ranks launched on the compute node, up to a maximum of 16 threads.
Total cores prescribed must be equal to or less than actual available cores
In all cases (SMP, DMP, and hybrid), the parallel processing run is limited by the actual number of physical cores available[1]. If the number of cores prescribed exceeds the number of physical cores available[1], the number of threads/processes will be automatically reduced for SMP, DMP, and hybrid so that the total cores calculated by Equation 1–1 do not exceed the physical CPU cores available.
If the specifics of your analysis indicate that using hybrid parallel will improve
performance, the program will automatically invoke hybrid parallel. The decision to
activate auto-hybrid is made according to a set of heuristics at the beginning of
the first load step solution, just before the domain decomposition step is
performed. The auto-hybrid feature is on by default as long as no
–nt
value is provided on the command line. If
auto-hybrid is invoked during solution:
The program automatically sets values for
-nt
and-np
, keeping the same number for the total cores you prescribed for your analysis, and the solution runs in hybrid parallel. Note that the program specifies an-nt
value per process. This may result in one or more domains utilizing additional threads while other domains use a single thread. There is no mechanism available to specify an unbalanced number of threads per process so that some processes (or domains) have more or less threads than other processes (or domains).An output message will report that hybrid parallel has been automatically invoked with the values used for the number of processes (
-np
) and threads per process (-nt
).
To Turn off Auto-hybrid — If you specify a value for the number of threads/process by issuing the
-nt
command line option, the auto-hybrid feature is shut
off. For example, if you are sure that you want an analysis to run using pure
DMP with only one thread/process during solution, you can turn off auto-hybrid
via the command line option to specify one thread per process:
-nt
1. Likewise, you can activate a uniform two threads per
process during solution with -nt
2. When specified on the
command line, the single -nt
value is applied to all
processes.
[1] Note that additional licenses are required to run a parallel processing solution with more than four cores in all cases (SMP, DMP, and hybrid). See HPC Licensing for details.