4.2. Understanding Memory and Disk Space Usage Information in the Output File

The following excerpt shows an example of information on the solver, memory and disk space usage, physical memory available, and the amount of I/O written and read that is listed at the end of the output file for a distributed-memory parallel (DMP) simulation (Jobname.out).

Equation solver used                              :      PCG (symmetric)
Equation solver computational rate                :      241.1 Gflops

 

Maximum disk space used on any process            :       81.1 GB
Maximum memory used on any process                :       34.0 GB
Maximum memory allocated on any process           :       53.1 GB

 

Maximum disk space used on any compute node       :       81.1 GB
Maximum memory used on any compute node           :      118.4 GB
Maximum memory allocated on any compute node      :      143.7 GB

 

Sum of disk space used on all processes           :       81.1 GB
Sum of memory used on all processes               :     1385.6 GB
Sum of memory allocated on all processes          :     1509.5 GB

 

Physical memory available on primary compute node :        377 GB
Physical memory available on all compute nodes    :       6024 GB

 

Total amount of I/O written to disk               :      133.0 GB
Total amount of I/O read from disk                :       81.4 GB

To understand the meaning of the usage values reported, consider the following illustrative plot of memory usage vs. time for a DMP simulation where the blue curve represents master process memory usage and the orange lines plot memory usage of the worker processes. A similar conceptual plot would represent disk space usage. Note that each process has its own memory usage curve with various peaks over the course of the simulation. The horizontal lines on each curve delineate the maximum usage value for each process, which can occur at different times during the simulation.

Figure 4.2: Conceptual Plot of Memory or Disk Space Usage for a DMP Simulation

Conceptual Plot of Memory or Disk Space Usage for a DMP Simulation

With distributed-memory parallel processing, all computations in the solution phase are performed in parallel across different cores in a single workstation or laptop or across different machines (or nodes) in a cluster. The maximum memory (or disk space) used is reported in addition to the maximum memory allocated for the following:

  • any process

  • any compute node

  • the sum of all processes

These usage and allocation values relate to software consumption. Instead, the physical memory available is simply reporting hardware values available in the cluster. Finally, the total input/output (I/O) written to or read from the disk is reported.

The following table explains these values in greater detail. Usage values reported are discussed in relation to the conceptual example plot for a DMP simulation with one master and two worker processes.

Table 4.5: Detailed Description of Information Reported in the Output File

LabelDetailed description
"Maximum memory (or disk space) used on any process"The maximum amount of memory (or disk space) used by any of the distributed-memory parallel (DMP) processes. The master process will typically use the most memory (and most disk space) of all the processes, as depicted in the figure.
"Maximum memory allocated on any process"The maximum amount of memory allocated in the program's memory manager by any of the DMP processes. The master process will typically allocate the most memory of all the processes.
"Maximum memory (or disk space) used on any compute node"

The maximum amount of memory (or disk space) used by all of the DMP processes running on each individual compute node. In other words, the maximum amount of memory used for each process is summed for all processes on each compute node and the maximum of those sums is taken as this value. Note, this value will only be reported when running on multiple compute nodes.

"Maximum memory (or disk space) allocated on any compute node"The maximum amount of memory allocated by all of the DMP processes running on each individual compute node. In other words, the maximum amount of memory allocated for each process is summed for all processes on each compute node, and the maximum of those sums is reported as this value. Note, this value will only be reported when running on multiple compute nodes.
"Sum of memory (or disk space) used on all processes"The sum of the maximum amount of memory (or disk space) used by all of the DMP processes, or the sum of the value delineated by horizontal lines in the figure. Note, this value will typically overestimate the true maximum amount of memory required by all of the processes at any given point in time.
"Sum of memory (or disk space) allocated on all processes"

The sum of the maximum amount of memory allocated by all of the distributed memory parallel processes. Note, this value will typically overestimate the true maximum amount of memory allocated by all of the processes at any given point in time.

"Physical memory (or disk space) available on primary compute node"

The amount of physical memory available on the primary (or head) compute node, the compute node on which the master process runs.

"Physical memory (or disk space) available on all compute nodes"

The sum of the amounts of physical memory available on all compute nodes used.

"Total amount of I/O written to disk"

The sum of all I/O writing to disk that occurs for all processes. This is a cumulative measurement that may be significantly higher than the file size of all files written to disk as I/O writing can be repetitive throughout the simulation.

"Total amount of I/O read from disk"

The sum of all I/O read from disk that occurs for all processes. This is a cumulative measurement that may be significantly higher than the file size of all files written to disk as I/O reading can be repetitive throughout the simulation.