3.3. I/O Requirements

The final major computing demand in Mechanical APDL is file I/O. The use of disk storage extends the capability of the program to solve large model simulations and also provides for permanent storage of results.

One of the most acute file I/O bottlenecks occurs in the sparse direct equation solver and Block Lanczos eigensolver, where very large files are read forward and backward multiple times. For Block Lanczos, average-sized runs can easily perform a total data transfer of 1 terabyte or more from disk files that are tens of GB in size or larger. Another expensive I/O demand is saving results for multiple step (time step or load step) analyses. Results files that are tens to hundreds of GB in size are common if all results are saved for all time steps in a large model, or for a nonlinear or transient analysis with many solutions.

This section discusses ways to minimize the I/O time. Important breakthroughs in desktop I/O performance that have been added to the program are described later in this document. To understand recent improvements in shared memory and distributed memory processing I/O, a discussion of I/O hardware follows.

3.3.1. I/O Hardware

I/O capacity and speed are important parts of a well-balanced system. While disk storage capacity has grown dramatically in recent years, the speed at which data is transferred to and from disks has not increased as much as processor speed. Processors compute at tens, or hundreds, of Gflops (billions of floating point operations per second) today. While best-case sequential disk transfers reach the GB/s (gigabytes per second) range, worst-case transfers are in the MB/sec (megabytes per second) range—a difference of about a 1,000 times compared to CPU operation speeds. This performance disparity can be hidden by the effective use of large amounts of memory to cache file accesses. However, the size of Mechanical APDL files often grows much larger than the available physical memory so system file caching is not always able to hide the I/O cost.

Many desktop systems today have hard drives that hold several hundred GB or more of storage. However, disk transfer rates can slow down when Mechanical APDL I/O operations require a sustained stream to read or write multiple files that are many gigabytes in size. To address this, local, fast, solid-state drives are the recommended choice of hardware for handling Mechanical APDL I/O in both desktop and cluster environments. The I/O during a simulation can be significant, so having high-performance storage is paramount. For this reason, using the older spinning HDD technology is discouraged in favor of modern SSDs. As general guidance, the local SSDs should have about 10 times the capacity of the memory installed on the system. These disks can be large, individual disks, or multiple disks configured to appear as a single, unified disk. Ideally, use separate disks for normal system I/O and the I/O required by the Mechanical APDL application. Although some environments are equipped with high-performance parallel network file systems, these are still often inadequate to handle the I/O requirements during a simulation.

Another key bottleneck for I/O on many systems comes from using centralized I/O resources that share a relatively slow interconnect, such as a network file system. Very few system interconnects can sustain the required speeds to accommodate the large files that the Mechanical APDL application reads and writes. Centralized disk resources can provide high bandwidth and large capacity to multiple compute servers, but such a configuration requires expensive high-speed interconnects to supply each compute server independently for simultaneously running jobs. Another common pitfall with centralized I/O resources comes when the central I/O system is configured for redundancy and data integrity. While this approach is desirable for transaction type processing, it will severely degrade high performance I/O in most cases. If central I/O resources are to be used for Mechanical APDL simulations, a high-performance configuration is essential.

Finally, consider alternative solutions for improving Mechanical APDL I/O performance, which may not work for all cases. For example, you can configure Mechanical APDL file buffers to grow dynamically, which can substantially reduce I/O. This strategy is most effective on systems with significant excess memory that use a shared network drive and have no local SSDs. This approach only makes sense when the amount of physical memory on a system is large enough that all Mechanical APDL files can be brought into memory. The approach of adjusting the file buffers consumes significant physical memory because the Mechanical APDL application requires that the size and number of file buffers for each file is identical, so the memory required for the largest files determines how much physical memory must be reserved for each file opened (many files are opened in a typical solution). This file buffer I/O comes from the user scratch memory, making it unavailable for other system functions or applications that may be running at the same time.

While perhaps a legacy concept, an alternative approach to avoid is the so-called RAM disk. In this configuration a portion of physical memory is reserved, usually at boot time, for a disk partition. All files stored on this RAM disk partition are really in memory. Though this configuration will be faster than I/O to a real disk drive, it requires that the user have enough physical memory to reserve part of it for the RAM disk. Once again, if a system has enough memory to reserve a RAM disk, then it also has enough memory to automatically cache the Mechanical APDL files, rendering the RAM disk unnecessary. The RAM disk also has significant disadvantages in that it is a fixed size, and if it is filled up the job will fail, often with no warning.

The bottom line for minimizing I/O times, for both shared memory and distributed memory parallel processing, is to use as much memory as possible to minimize the actual I/O required. For I/O that is unavoidable, use a fast, local SSD. Fast I/O is no longer a high-cost addition if properly configured and understood. The following is a summary of I/O configuration recommendations for Mechanical APDL users.

Table 3.1: Recommended Configuration for I/O Hardware

Recommended Configuration for I/O Hardware
  • Use one disk for system and permanent files.

  • Use separate, high-performance solid-state drives as a temporary workspace for Mechanical APDL file I/O.

  • Consider setting up a swap space to allow for short-lived memory spikes. Swap space need not equal memory size on very large memory systems (that is, swap space should be less than 32 GB).


3.3.2. I/O Considerations for Distributed Memory Parallel Processing

For SMP runs, there is only one set of Mechanical APDL files active for a given simulation. However, for a DMP simulation, each core maintains its own set of files. This places an ever greater demand on the I/O resources for a system as the number of cores used by the program is increased. For this reason, performance is best when solver I/O can be eliminated altogether or when multiple nodes are used for parallel runs, each with a separate local I/O resource. If a DMP solution is run on a single machine and lots of I/O must be done due to a lack of physical memory, then solid state drives (SSDs) may be very beneficial for achieving optimal performance. The significantly reduced seek time of SSDs can help to reduce the cost of having multiple processors each writing/reading their own set of files. Conventional hard drives will have a huge sequential bottleneck as the disk head moves to the location of each processor's file(s). This bottleneck is virtually eliminated using SSDs, thus making optimal performance possible.