2.2. Troubleshooting

This section describes problems which you may encounter while using shared-memory parallel processing as well as methods for overcoming these problems. Some of these problems are specific to a particular system, as noted.

Job fails with SIGTERM signal (Linux Only)

Occasionally, when running on Linux, a simulation may fail with the following message: “process killed (SIGTERM)”. This typically occurs when computing the solution and means that the system has killed the Ansys process. The two most common occurrences are (1) Ansys is using too much of the hardware resources and the system has killed the Ansys process or (2) a user has manually killed the Ansys job (that is, kill -9 system command). Users should check the size of job they are running in relation to the amount of physical memory on the machine. Most often, decreasing the model size or finding a machine with more RAM will result in a successful run.

Poor Speedup or No Speedup

As more cores are utilized, the runtimes are generally expected to decrease. The biggest relative gains are typically achieved when using two cores compared to using a single core. When significant speedups are not seen as additional cores are used, the reasons may involve both hardware and software issues. These include, but are not limited to, the following situations.

Hardware

Oversubscribing hardware  —  In a multiuser environment, this could mean that more physical cores are being used by Ansys simulations than are available on the machine. It could also mean that hyperthreading is activated. Hyperthreading typically involves enabling extra virtual cores, which can sometimes allow software programs to more effectively use the full processing power of the CPU. However, for compute-intensive programs such as Ansys, using these virtual cores rarely provides a significant reduction in runtime. Therefore, it is recommended you disable hyperthreading; if hyperthreading is enabled, it is recommended you do not exceed the number of physical cores.

Lack of memory bandwidth  —  On some systems, using most or all of the available cores can result in a lack of memory bandwidth. This lack of memory bandwidth can affect the overall scalability of the Ansys software.

Dynamic Processor Speeds  —  Many new CPUs have the ability to dynamically adjust the clock speed at which they operate based on the current workloads. Typically, when only a single core is being used the clock speed can be significantly higher than when all of the CPU cores are being utilized. This can have a negative effect on scalability as the per-core computational performance can be much higher when only a single core is active versus the case when all of the CPU cores are active.

Software

Simulation includes non-supported features  —  The shared- and distributed-memory parallelisms work to speed up certain compute-intensive operations in /PREP7, /SOLU and /POST1. However, not all operations are parallelized. If a particular operation that is not parallelized dominates the simulation time, then using additional cores will not help achieve a faster runtime.

Simulation has too few DOF (degrees of freedom)  —  Some analyses (such as transient analyses) may require long compute times, not because the number of DOF is large, but because a large number of calculations are performed (that is, a very large number of time steps). Generally, if the number of DOF is relatively small, parallel processing will not significantly decrease the solution time. Consequently, for small models with many time steps, parallel performance may be poor because the model size is too small to fully utilize a large number of cores.

I/O cost dominates solution time  —  For some simulations, the amount of memory required to obtain a solution is greater than the physical memory (that is, RAM) available on the machine. In these cases, either virtual memory (that is, hard disk space) is used by the operating system to hold the data that would otherwise be stored in memory, or the equation solver writes extra files to the disk to store data. In both cases, the extra I/O done using the hard drive can significantly affect performance, making the I/O performance the main bottleneck to achieving optimal performance. In these cases, using additional cores will typically not result in a significant reduction in overall time to solution.

Different Results Relative to a Single Core

Shared-memory parallel processing occurs in various preprocessing, solution, and postprocessing operations. Operational randomness and numerical round-off inherent to parallelism can cause slightly different results between runs on the same machine using the same number of cores or different numbers of cores. This difference is often negligible. However, in some cases the difference is appreciable. This sort of behavior is most commonly seen on nonlinear static or transient analyses which are numerically unstable. The more numerically unstable the model is, the more likely the convergence pattern or final results will differ as the number of cores used in the simulation is changed.

With shared-memory parallelism, you can use the PSCONTROL command to control which operations actually use parallel behavior. For example, you could use this command to show that the element matrix generation running in parallel is causing a nonlinear job to converge to a slightly different solution each time it runs (even on the same machine with no change to the input data). This can help isolate parallel computations which are affecting the solution while maintaining as much other parallelism as possible to continue to reduce the time to solution.