3.3. I/O Requirements

The final major computing demand in Mechanical APDL is file I/O. The use of disk storage extends the capability of the program to solve large model simulations and also provides for permanent storage of results.

One of the most acute file I/O bottlenecks occurs in the sparse direct equation solver and Block Lanczos eigensolver, where very large files are read forward and backward multiple times. For Block Lanczos, average-sized runs can easily perform a total data transfer of 1 TeraByte or more from disk files that are tens of GB in size or larger. At a typical disk I/O rate of 50-100 MB/sec on many desktop systems, this I/O demand can add hours of elapsed time to a simulation. Another expensive I/O demand is saving results for multiple step (time step or load step) analyses. Results files that are tens to hundreds of GB in size are common if all results are saved for all time steps in a large model, or for a nonlinear or transient analysis with many solutions.

This section discusses ways to minimize the I/O time. Important breakthroughs in desktop I/O performance that have been added to the program are described later in this document. To understand recent improvements in shared memory and distributed memory processing I/O, a discussion of I/O hardware follows.

3.3.1. I/O Hardware

I/O capacity and speed are important parts of a well balanced system. While disk storage capacity has grown dramatically in recent years, the speed at which data is transferred to and from disks has not increased nearly as much as processor speed. Processors compute at Gflops (billions of floating point operations per second) today, while disk transfers are measured in MB/sec (megabytes per seconds), a factor of 1,000 difference! This performance disparity can be hidden by the effective use of large amounts of memory to cache file accesses. However, the size of Mechanical APDL files often grows much larger than the available physical memory so that system file caching is not always able to hide the I/O cost.

Many desktop systems today have very large capacity hard drives that hold several hundred GB or more of storage. However, the transfer rates to these large disks can be very slow and are significant bottlenecks to performance. Mechanical APDL I/O requires a sustained I/O stream to write or read files that can be many GBytes in length.

Solid state drives (SSDs) are becoming more popular as the technology improves and costs decrease. SSDs offer significantly reduced seek times while maintaining good transfer rates when compared to the latest hard disk drives. This can lead to dramatic performance improvements when lots of I/O is performed, such as when running the sparse direct solver in an out-of-core memory mode. Factors such as cost and mean-time-failure must also be considered when choosing between SSDs and conventional hard disk drives.

The key to obtaining outstanding performance is not finding a fast single drive, but rather using disk configurations that use multiple drives configured in a RAID setup that looks like a single disk drive to a user. For fast runs, the recommended configuration is a RAID0 setup using 4 or more disks and a fast RAID controller. These fast I/O configurations are inexpensive to put together for desktop systems and can achieve I/O rates in excess of 200 MB/sec, using conventional hard drives and over 500 MB/sec using SSDs.

Ideally, a dedicated RAID0 disk configuration is recommended. This dedicated drive should be regularly defragmented or reformatted to keep the disk clean. Using a dedicated drive for Mechanical APDL runs also separates Mechanical APDL I/O demands from other system I/O during simulation runs. Cheaper permanent storage can be used for files after the simulation runs are completed.

Another key bottleneck for I/O on many systems comes from using centralized I/O resources that share a relatively slow interconnect. Very few system interconnects can sustain 100 MB/sec or more for the large files that Mechanical APDL reads and writes. Centralized disk resources can provide high bandwidth and large capacity to multiple compute servers, but such a configuration requires expensive high-speed interconnects to supply each compute server independently for simultaneously running jobs. Another common pitfall with centralized I/O resources comes when the central I/O system is configured for redundancy and data integrity. While this approach is desirable for transaction type processing, it will severely degrade high performance I/O in most cases. If central I/O resources are to be used for Mechanical APDL simulations, a high performance configuration is essential.

Finally, you should be aware of alternative solutions to I/O performance that may not work well. Some users may have experimented with eliminating I/O by increasing the number and size of internal Mechanical APDL file buffers. This strategy only makes sense when the amount of physical memory on a system is large enough so that all Mechanical APDL files can be brought into memory. However, in this case file I/O is already in memory on 64-bit operating systems using the system buffer caches. This approach of adjusting the file buffers wastes physical memory because Mechanical APDL requires that the size and number of file buffers for each file is identical, so the memory required for the largest files determines how much physical memory must be reserved for each file opened (many files are opened in a typical solution). All of this file buffer I/O comes from the user scratch memory, making it unavailable for other system functions or applications that may be running at the same time.

Another alternative approach to avoid is the so-called RAM disk. In this configuration a portion of physical memory is reserved, usually at boot time, for a disk partition. All files stored on this RAM disk partition are really in memory. Though this configuration will be faster than I/O to a real disk drive, it requires that the user have enough physical memory to reserve part of it for the RAM disk. Once again, if a system has enough memory to reserve a RAM disk, then it also has enough memory to automatically cache the Mechanical APDL files. The RAM disk also has significant disadvantages in that it is a fixed size, and if it is filled up the job will fail, often with no warning.

The bottom line for minimizing I/O times, for both shared memory and distributed memory parallel processing, is to use as much memory as possible to minimize the actual I/O required and to use multiple disk RAID arrays in a separate work directory for the Mechanical APDL working directory. Fast I/O is no longer a high cost addition if properly configured and understood. The following is a summary of I/O configuration recommendations for Mechanical APDL users.

Table 3.1: Recommended Configuration for I/O Hardware

Recommended Configuration for I/O Hardware

Use a single large drive for system and permanent files.
Use a separate disk partition of 4 or more identical physical drives for Mechanical APDL working directory. Use RAID0 across the physical drives.
Consider the use of solid state drive(s) for maximum performance.
Size of working directory should preferably be <1/3 of total RAID drive capacity.
Keep working directory clean, and defragment or reformat regularly.
Set up a swap space equal to physical memory size. Swap space need not equal memory size on very large memory systems (that is, swap space should be less than 32 GB). Increasing swap space is effective when there is a short-time, high-memory requirement such as meshing a very large component.

3.3.2. I/O Considerations for Distributed Memory Parallel Processing

I/O in the sparse direct solver has been optimized on Windows systems to take full advantage of multiple drive RAID0 arrays. An inexpensive investment for a RAID0 array on a desktop system can yield significant gains in performance. However, for desktop systems (and many cluster configurations) it is important to understand that many cores share the same disk resources. Therefore, obtaining fast I/O performance in applications such as Mechanical APDL is often not as simple as adding a fast RAID0 configuration.

For SMP runs, there is only one set of Mechanical APDL files active for a given simulation. However, for a DMP simulation, each core maintains its own set of files. This places an ever greater demand on the I/O resources for a system as the number of cores used by the program is increased. For this reason, performance is best when solver I/O can be eliminated altogether or when multiple nodes are used for parallel runs, each with a separate local I/O resource. If a DMP solution is run on a single machine and lots of I/O must be done due to a lack of physical memory, then solid state drives (SSDs) may be very beneficial for achieving optimal performance. The significantly reduced seek time of SSDs can help to reduce the cost of having multiple processors each writing/reading their own set of files. Conventional hard drives will have a huge sequential bottleneck as the disk head moves to the location of each processor's file(s). This bottleneck is virtually eliminated using SSDs, thus making optimal performance possible.