4.4. Understanding the Working Principles and Behavior of Distributed-memory Parallelism

4.4.1. Differences in General Behavior

File Handling Conventions

Upon startup and during a DMP solution, the program appends n to the current jobname (where n stands for the process rank). The master process rank is 0 and the worker processes are numbered from 1 to N - 1. In the rare case that the user-supplied jobname ends in a numeric value (0...9) or an underscore, an underscore is automatically appended to the jobname prior to appending the process rank. This is done to avoid any potential conflicts with the new file names.

Therefore, upon startup and during a parallel solution, each process will create and use files named Jobnamen.ext. These files contain the local data that is specific to each DMP process. Some common examples include the .log and .err files as well as most files created during solution such as .esav, .full, .rst, and .mode. See Program-Generated Files in the Basic Analysis Guide for more information on files that Mechanical APDL typically creates.

Actions that are performed only by the master process (/PREP7, /POST1, etc.) will work on the global Jobname.ext files by default. These files (such as Jobname.db and Jobname.rst) contain the global data for the entire model. For example, only the master process will save and resume the Jobname.db file.

After a parallel solution successfully completes, the program automatically merges some of the (local) Jobnamen.ext files into a single (global) file named Jobname.ext. These include files such as Jobname.rst, Jobname.esav, Jobname.emat, and so on (see the DMPOPTION command for a complete list of these files). This action is performed when the FINISH command is executed upon leaving the solution processor. These files contain the same information about the final computed solution as files generated for the same model computed with SMP processing. Therefore, all downstream operations (such as postprocessing) can be performed using SMP processing (or in the same manner as SMP processing) by using these global files.

If any of these global Jobname.EXT files are not needed for downstream operations, you can reduce the overall solution time by suppressing the file combination for individual file types (see the DMPOPTION command for more information). If it is later determined that a global Jobname.EXT file is needed for a subsequent operation or analysis, the local files can be combined by using the COMBINE command.

DMPOPTION also has an option to combine the results file at certain time points during the distributed solution. This enables you to postprocess the model while the solution is in progress, but leads to slower performance due to increased data communication and I/O.

The program will not delete most files written by the worker processes when the analysis is completed. If you choose, you can delete these files when your analysis is complete (including any restarts that you may wish to perform). If you do not wish to have the files necessary for a restart saved, you can issue RESCONTROL,NORESTART.

File copy, delete, and rename operations can be performed across all processes by using the DistKey option on the /COPY, /DELETE, and /RENAME commands. This provides a convenient way to manage local files created by a distributed parallel solution. For example, /DELETE,Fname,ext,,ON automatically appends the process rank number to the specified file name and deletes Fnamen.ext from all processes. If the local files (Jobname.*) are not needed in downstream analyses, you can issue /FCLEAN to delete them all to save space. See the /COPY, /DELETE, /FCLEAN, and /RENAME command descriptions for more information. In addition, the /ASSIGN command can be used to control the name and location of specific local and global files created during a distributed-memory parallel analysis.

Batch and Interactive Mode

You can launch a DMP analysis in either interactive or batch mode for the master process. However, the worker processes are always in batch mode. The worker processes cannot read the start.ans or stop.ans files. The master process sends all /CONFIG,LABEL commands to the worker processes as needed.

On Windows systems, there is no Mechanical APDL output console window when running a DMP analysis in the GUI (interactive mode). All standard output from the master process will be written to a file named file0.out. (Note that the Jobname is not used.)

Output Files

When a DMP analysis is executed, the output for the master process is written in the same fashion as an SMP run. By default, the output is written to the screen; or, if you specified an output file via the launcher or the -o command line option, the output for the master process is written to that file.

The worker processes automatically write this output to Jobnamen.out. Typically, these worker process output files have little value because all of the relevant job information is written to the screen or the master process output file. The exception is when the domain decomposition method is automatically chosen to be FREQ (frequency-based for a harmonic analysis) or CYCHI (harmonic index-based for a cyclic symmetry analysis); see the DDOPTION command for more details. In these cases, the solution information for the harmonic frequencies or cyclic harmonic indices solved by the worker processes are only written to the output files for those processes (Jobnamen.out).

Error Handling

The same principle also applies to the error file Jobnamen.err. When a warning or error occurs on one of the worker processes during a DMP solution, the process writes that warning or error message to its error file and then communicates the warning or error message to the master process. Typically, this allows the master process to write the warning or error message to its error file and output file and, in the case of an error message, allows all of the distributed processes to exit the program simultaneously.

In some cases, an error message may fail to be fully communicated to the master process. If this happens, you can view each Jobnamen.err and/or Jobnamen.out file in an attempt to learn why the job failed. The error files and output files written by all the processes will be incomplete but may still provide some useful information as to why the job failed.

In some rare cases, the job may hang. When this happens, you can use the cleanup script (or .bat file on Windows) to kill the processes. The cleanup script is automatically written into the initial working directory of the master process and is named cleanup-ansys-[machineName]-[processID].sh (or .bat).

Use of Mechanical APDL Commands

In pre- and postprocessing, APDL works the same in a DMP analysis as in an SMP analysis. However, in the solution processor (/SOLU), a DMP analysis does not support certain *GET items. In general, DMP processing supports global solution *GET results such as total displacements and reaction forces. It does not support element level results specified by ESEL, ESOL, and ETABLE labels. Unsupported items will return a *GET value of zero.

Condensed Data Input

Multiple commands entered in an input file can be condensed into a single line if the commands are separated by the $ character (see Condensed Data Input in the Command Reference). DMP processing cannot properly handle condensed data input. Each command must be placed on its own line in the input file for a DMP run.

4.4.2. Differences in Solution Processing

Domain Decomposition (DDOPTION Command)

Upon starting a solution in a DMP analysis, the program automatically decomposes the problem into N CPU domains so that each process (or each CPU core) works on only a portion of the simulation. This domain decomposition is done only by the master process. Typically, the optimal domain decomposition method is automatically chosen based on a variety of factors (analysis type, number of CPU cores, available RAM, and so on). However, you can control which domain decomposition method is used by setting the Decomp argument on the DDOPTION command.

For most simulations, the program automatically chooses the mesh-based domain decomposition method (Decomp = MESH), which means each CPU domain is a group or subset of elements within the whole model. For certain harmonic analyses, the domain decomposition may be based on the frequency domain (Decomp = FREQ), in which case each CPU domain computes the harmonic solution for the entire model at a different frequency point. For certain cyclic symmetry analyses, the domain decomposition may be based on the harmonic indices (Decomp = CYCHI), in which case each CPU domain computes the cyclic solution for a different harmonic index.

The NPROCPERSOL argument on the DDOPTION command gives you the flexibility to combine the FREQ or CYCHI decomposition methods with the mesh-based domain decomposition method. Consider a harmonic analysis with 50 frequency points requested (NSUBST,50) run on a workstation executing a DMP solution with 16 cores (-dis-np 16). Using mesh decomposition (DDOPTION,MESH) essentially solves one frequency at a time with 16 groups of elements (1x16). Using DDOPTION,FREQ,1 solves 16 frequencies at a time with 1 group of elements; that is, the entire FEA model (16x1). Using the NPROCPERSOL field allows you to consider alternative combinations in between these 2 scenarios. You could try DDOPTION,FREQ,2 to solve 8 frequencies at a time with 2 groups of elements per solution (8x2), or DDOPTION,FREQ,4 to solve 4 frequencies with 4 groups of elements (4x4), and so on. Note that the total core count specified at startup cannot be altered, and the program works to maintain the NPROCPERSOL value as input, which means the number of frequency or cyclic harmonic index solutions solved at a time may need to be adjusted to fit within the other defined parameters.

In the case of a linear perturbation analysis, you should also set the NUMSOLFORLP argument to ensure the best solution performance. For decomposition based on the frequency domain (Decomp = FREQ or automatically chosen when Decomp = AUTO), set NUMSOLFORLP to the number of frequency solutions in the subsequent harmonic analysis. For decomposition based on harmonic index (Decomp = CYCHI or automatically chosen when Decomp = AUTO), set NUMSOLFORLP to the number of harmonic index solutions in the subsequent cyclic modal analysis.

There are pros and cons for each domain decomposition approach. For example, with the default MESH method, the total amount of memory required to solve the simulation in N processes is typically not much larger than the amount of memory required to solve the simulation on one process. However, when using the FREQ or CYCHI domain decomposition method, the amount of memory required to solve the simulation on N processes is typically N times greater than the amount of memory to solve the simulation on one process. As another example, with the MESH domain decomposition method, the application requires a significant amount of data communication via MPI, thus requiring a fast interconnect for optimal performance. With the FREQ and CYCHI methods, very little data communication is necessary between the processes and, therefore, good performance can still be achieved using slower interconnect hardware.

All FEA data (elements, nodes, materials, sections, real constants, boundary conditions, etc.) required to compute the solution for each CPU domain is communicated to the worker processes by the master process. Throughout the solution, each process works only on its piece of the entire model. When the solution phase ends (for example, FINISH is issued in the solution processor), the master process in a DMP analysis works on the entire model again (that is, it behaves like SMP processing).

Print Output (OUTPR Command)

In DMP processing, the OUTPR command prints NSOL and RSOL in the same manner as in SMP processing. However, for other items such as ESOL, a DMP analysis prints only the element solution for the group of elements belonging to the CPU domain of the master process. Therefore, OUTPR, ESOL has incomplete information and is not recommended. Also, the order of elements is different from that of SMP processing due to domain decomposition. A direct one-to-one element comparison with SMP processing will be different if using OUTPR.

When using the frequency domain decomposition in a harmonic analysis, the NSOL, RSOL, and ESOL data is printed into the output file written by the master process that computes the frequency solutions. The OUTPR print for the other frequency solutions is written to the output files created by the worker processes (Jobnamen.out).

Large Number of CE/CP and Contact Elements

Both SMP processing and DMP processing can handle a large number of coupling and constraint equations (CE/CP) and contact elements. However, specifying too many of these items can force a DMP solution to communicate more data among each process, resulting in longer elapsed time to complete a distributed parallel job. You should reduce the number of CE/CP if possible and make potential contact pairs in a smaller region to achieve non-deteriorated performance. In addition, for assembly contact pairs or small sliding contact pairs, you can use the command CNCHECK,TRIM to remove contact and target elements that are initially in far-field (open and not near contact). This trimming option will help to achieve better performance in DMP runs.

4.4.3. Differences in Postprocessing

Postprocessing with Database File and SET Commands

SMP processing can postprocess the last set of results using the Jobname.db file (if the solution results were saved), as well as using the Jobname.rst file. A DMP analysis, however, can only postprocess using the Jobname.rst file and cannot use the Jobname.db file as solution results are not entirely written to the database. You will need to issue a SET command before postprocessing.

Postprocessing with Multiple Results Files

By default, a DMP analysis will automatically combine the local results files (for example, Jobnamen.rst) into a single global results file (Jobname.rst). This step can be expensive depending on the number of load steps and the amount of results stored for each solution. It requires each local results file to be read by each worker process, communicated to the master process, and then combined together and written by the master process. As a means to reduce the amount of communication and I/O performed by this operation, the DMPOPTION command can be used to skip the step of combining the local results file into a single global results file. Then the RESCOMBINE command macro can be used in /POST1 to individually read each local results file until the entire set of results is placed into the database for postprocessing. If needed, a subsequent RESWRITE command can then be issued to write a global results file for the distributed solution.

Note that if the step of combining the results file is skipped, it may affect downstream analyses that rely on a single global results file for the entire model. If it is later determined that a global results file (for example, Jobname.rst) is needed for a subsequent operation, you can use the COMBINE command to combine the local results files into a single, global results file.

4.4.4. Restarts with DMP processing

DMP processing supports multiframe restarts for nonlinear static, full transient, and mode-superposition transient analyses. The procedures and command controls are the same as described in the Basic Analysis Guide. However, restarts with DMP processing have additional limitations based on the procedure used, as described in the following sections:

See also Additional Consideration for the Restart for more information on restarting a distributed-memory parallel solution.

4.4.4.1. Procedure 1 - Use the Same Number of Cores

This procedure requires that you use the same number of cores for the restart as in the original run. It does not require any additional commands beyond those used in a typical multiframe restart procedure.

The total number of cores used when restarting a DMP analysis must not be altered following the first load step and first substep.
Example 1: If you use the following command line for the first load step:
```
ansys242 -dis -np 8 -i input -o output1
```
then, you must also use 8 cores (-dis -np 8) for the multiframe restart, and the files from the original analysis that are required for the restart must be located in the current working directory. Additionally, this means you cannot perform a restart using SMP processing if DMP processing was used prior to the restart point.
When running across machines, the job launch procedure (or script) used when restarting a DMP analysis must not be altered following the first load step and first substep. In other words, you must use the same number of machines, the same number of cores for each of the machines, and the same head compute node / other compute nodes relationships among these machines in the restart job that follows.
Example 2: If you use the following command line for the first load step where the head compute node (which always appears first in the list of machines) is mach1, and the other compute nodes are mach2 and mach3:
```
ansys242 –dis –machines mach1:4:mach2:1:mach3:2 –i input –o output1
```
then for the multiframe restart, you must use a command line such as this:
```
ansys242 –dis –machines mach7:4:mach6:1:mach5:2 –i restartjob –o output2
```
This command line uses the same number of machines (3), the same number of cores for each machine in the list (4:1:2), and the same head compute node / other compute nodes relationship (4 cores on the head compute node, 1 core on the second compute node, and 2 cores on the third compute node) as the original run. Any alterations in the -machines field, other than the actual machine names, will result in restart failure. Finally, the files from the original analysis that are required for the restart must be located in the current working directory on each of the machines.

The files needed for a restart must be available on the machine(s) used for the restarted analysis. Each machine has its own restart files that are written from the previous run. The restart process needs to use these files to perform the correct restart actions.

For Example 1 above, if the two analyses (-dis –np 8) are performed in the same working directory, no action is required; the restart files will already be available. However, if the restart is performed in a new directory, all of the restart files listed in Table 4.3: Required Files for Multiframe Restart - Procedure 1 must be copied (or moved) into the new directory before performing the multiframe restart.

For Example 2 above, the restart files listed in the "Head Compute Node" column in Table 4.3: Required Files for Multiframe Restart - Procedure 1 must be copied (or moved) from mach1 to mach7, and all of the files in the "Other Compute Nodes" column must be copied (or moved) from mach2 to mach6 and from mach3 to mach5 before performing the multiframe restart.

Table 4.3: Required Files for Multiframe Restart - Procedure 1

Head Compute Node	Other Compute Nodes
Jobname.ldhi	- -
Jobname.rdb	- -
Jobname.rd`nn` if remeshed due to nonlinear adaptivity, where `nn` is the number of remeshings before the restart	- -
Jobname0.`xnnn` [1]	Jobname`n`.`xnnn` (where `n` is the process rank and `nnn` is a restart file identifier)
Jobname0.rst (this is the local .rst file for this domain) [2]	Jobname`n`.rst (this is the local .rst file for this domain) [2]

The .Xnnn file extension mentioned here refers to the .Rnnn and .Mnnn files discussed in Multiframe File Restart Requirements in the Basic Analysis Guide.
The Jobnamen.rst files are optional. The restart can be performed successfully without them.

4.4.4.2. Procedure 2 - Use a Different Number of Cores

In this procedure, the total number of cores used when restarting a DMP analysis can be altered following the first load step and first substep. In addition, you can perform the restart using either a distributed-memory parallel solution or a shared-memory parallel solution. Some additional steps beyond the typical multiframe restart procedure are required to ensure that necessary files are available.

Note: This procedure is not available when performing a restart for the following analysis types: mode-superposition transient analysis, an analysis that includes mesh nonlinear adaptivity, and 2D to 3D analysis.

In this procedure, the Jobname.rnnn file must be available. This file is not generated by default in DMP processing, even when restart controls are activated via the RESCONTROL command. You must either use the command DMPOPTION,RNN,YES in the prior (base) analysis, or you must manually combine the Jobnamen.rnnn files into the Jobname.rnnn file using the COMBINE command.
For example, if you use the following command line for the first load step:
```
ansys242 –dis –np 8 –i input –o output1
```
then for the multiframe restart, you can use more or less than 8 cores (-np N, where N does not equal 8).
The files from the original analysis that are required for the restart must be located in the current working directory. If running across machines, the restart files are only required to be in the current working directory on the head compute node.

Table 4.4: Required Files for Multiframe Restart - Procedure 2

Head Compute Node	Other Compute Nodes
Jobname.ldhi	- -
Jobname.RDB	- -
Jobname.RD`nn` if remeshed due to nonlinear adaptivity, where `nn` is the number of remeshings before the restart	- -
Jobname.R`nnn`	- -
Jobname.rst [1]	- -

The Jobname.rst file is optional. The restart can be performed successfully without it.

4.4.4.3. Additional Considerations for the Restart

The advantage of using Procedure 1 (same number of cores) is faster performance by avoiding the potentially costly steps of combining the Jobnamen.rnnn files and later splitting the Jobnamen.Rnnn (and possibly the Jobname.rst/rth/rmg) files during the restarted analysis. The disadvantage is that there are more files that need to be managed during the restart process.

The advantage of Procedure 2 (different number of cores) is less files to manage during the restart process, but at the cost of additional time to combine the local restart files and then, later on, to split this file data during the restarted analysis. Depending on the size of the simulation, this overhead may be insignificant.

In all restarts, the Jobname.rst results file (or Jobname.rth or Jobname.rmg) on the head compute node is recreated after each solution by merging the Jobnamen.rst files again.

If you do not require a restart, issue RESCONTROL,NORESTART in the run to remove or to avoid writing the necessary restart files on the head compute node and all other compute nodes. If you use this command, the worker processes will not have files such as .esav, .osav, .rst, or .x000, in the working directory at the end of the run. In addition, the master process will not have files such as .esav, .osav, .x000, .rdb, or .ldhi at the end of the run. The program will remove all of the above scratch files at the end of the solution phase (FINISH or /EXIT). This option is useful for file cleanup and control.