This section describes problems which you may encounter while using the GPU accelerator capability, as well as methods for overcoming these problems. Some of these problems are specific to a particular system, as noted.
GPU devices support various compute modes (for example, Exclusive thread, Exclusive process). Only the default compute mode is supported. Using other compute modes may cause the program to fail to launch.
To list the GPU devices installed on the machine, set the ANSGPU_PRINTDEVICES environment variable to a value of 1. The printed list may or may not include graphics cards used for display purposes, along with any graphics cards used to accelerate your simulation.
- No Devices
Be sure that a recommended GPU device is properly installed and configured. Check the driver level to be sure it is current or newer than the driver version supported for your particular device. (See the GPU requirements outlined in the Windows Installation Guide and the Linux Installation Guide.)
When using GPU devices, use of the CUDA_VISIBLE_DEVICES environment variable can block some or all of the GPU devices from being visible to the program. Try renaming this environment variable to see if the supported devices can be used.
Note: On Windows, the use of Remote Desktop may disable the use of a GPU device. Launching Mechanical APDL through the Ansys Remote Solve Manager (RSM) when RSM is installed as a service may also disable the use of a GPU. In these two scenarios, the GPU Accelerator Capability cannot be used. Using the TCC (Tesla Compute Cluster) driver mode, if applicable, can circumvent this restriction.
- No Valid Devices
A GPU device was detected, but it is not a recommended GPU device. Be sure that a recommended GPU device is properly installed and configured. Check the driver level to be sure it is current or newer than the supported driver version for your particular device. (See the GPU requirements outlined in the Windows Installation Guide and the Linux Installation Guide.) Consider using the ANSGPU_OVERRIDE environment variable to override the check for valid GPU devices.
When using GPU devices, use of the CUDA_VISIBLE_DEVICES environment variable can block some or all of the GPU devices from being visible to the program. Try renaming this environment variable to see if the supported devices can be used.
- Poor Acceleration or No Acceleration
Simulation includes non-supported features — A GPU device will only accelerate certain portions of a simulation, mainly the solution time. If the bulk of the simulation time is spent outside of solution, the GPU cannot have a significant effect on the overall analysis time. Even if the bulk of the simulation is spent inside solution, you must be sure that a supported equation solver is utilized during solution and that no unsupported options are used. Messages are printed in the output to alert users when a GPU is being used, as well as when unsupported options/features are chosen which deactivate the GPU accelerator capability.
Simulation has too few DOF (degrees of freedom) — Some analyses (such as transient analyses) may require long compute times, not because the number of DOF is large, but because a large number of calculations are performed (that is, a very large number of time steps). Generally, if the number of DOF is relatively small, GPU acceleration will not significantly decrease the solution time. Consequently, for small models with many time steps, GPU acceleration may be poor because the model size is too small to fully utilize a GPU.
Simulation does not fully utilize the GPU — Only simulations that spend a lot of time performing calculations that are supported on a GPU can expect to see significant speedups when a GPU is used. Only certain computations are supported for GPU acceleration. Therefore, users should check to ensure that a high percentage of the solution time was spent performing computations that could possibly be accelerated on a GPU. This can be done by reviewing the equation solver statistics files as described below. See Measuring Performance in the Performance Guide for more details on the equation solver statistics files.
PCG solver file: The .PCS file contains statistics for the PCG iterative solver. You should first check to make sure that the GPU was utilized by the solver. This can be done by looking at the line which begins with: “Number of cores used”. The string “GPU acceleration enabled” will be added to this line if the GPU hardware was used by the solver. If this string is missing, the GPU was not used for that call to the solver. Next, you should study the elapsed times for both the “Preconditioner Factoring” and “Multiply With A22” computations. GPU hardware is only used to accelerate these two sets of computations. The wall clock (or elapsed) times for these computations are the areas of interest when determining how much GPU acceleration is achieved.
Sparse solver files: The .DSP file contains statistics for the sparse direct solver. You should first check to make sure that the GPU was utilized by the solver. This can be done by looking for the following line: “GPU acceleration activated”. This line will be printed if the GPU hardware was used. If this line is missing, the GPU was not used for that call to the solver. Next, you should check the percentage of factorization computations (flops) which were accelerated on a GPU. This is shown by the line: “percentage of GPU accelerated flops”. Also, you should look at the time to perform the matrix factorization, shown by the line: “time (cpu & wall) for numeric factor”. GPU hardware is only used to accelerate the matrix factor computations. These lines provide some indication of how much GPU acceleration is achieved.
Eigensolver files: The Block Lanczos and Subspace eigensolvers support the use of GPU devices; however, no statistics files are written by these eigensolvers. The .PCS file is written for the PCG Lanczos eigensolver and can be used as described above for the PCG iterative solver.
Using multiple GPU devices — When using the sparse solver in a shared-memory parallel solution, it is expected that running a simulation with multiple GPU devices will not improve performance compared to running with a single GPU device. In a shared-memory parallel solution, the sparse solver can only make use of one GPU device.
Oversubscribing GPU hardware — The program automatically determines which GPU devices to use. In a multiuser environment, this could mean that one or more of the same GPUs are picked when multiple simulations are run simultaneously, thus oversubscribing the hardware.
If only a single GPU accelerator device exists in the machine, then only a single user should attempt to make use of it, much in the same way users should avoid oversubscribing their CPU cores.
If multiple GPU accelerator devices exist in the machine, you can set the ANSGPU_DEVICE environment variable, in conjunction with the ANSGPU_PRINTDEVICES environment variable mentioned above, to specify which particular GPU accelerator devices to use during the solution.
For example, consider a scenario where ANSGPU_PRINTDEVICES shows that four GPU devices are available with device ID values of 1, 3, 5, and 7 respectively, and only the second and third devices are supported for GPU acceleration. To select only the second supported GPU device, set ANSGPU_DEVICE = 5. To select the first and second supported GPU devices, set ANSGPU_DEVICE = 3:5.
Error Code 101 (hipInvalidDevice) for AMD GPU devices — When using AMD GPU devices, if you encounter the following error,
Error code = 101 which translates to: hipErrorInvalidDevice. Please check your GPU device driver level and verify that a supported GPU device has been installed correctly.
add your user name (
LOGNAME
) to the list of users in the video group using the following command.sudo usermod -a -G video $LOGNAME
Solver/hardware combination — When using GPU devices, some solvers may not achieve good performance on certain devices. For more information, see Performance Issue for Some Solver/Hardware Combinations.