1.7.4. Troubleshooting

This section describes problems you may encounter while using the GPU accelerator capability, as well as methods for overcoming these problems. Some of these problems are specific to a particular system, as noted.

To list the GPU devices installed on the machine, set the ANSGPU_PRINTDEVICES environment variable to a value of 1. The displayed list may or may not include graphics cards used for display purposes, along with any graphics cards used to accelerate your simulation.

No Supported Devices

Be sure that a supported GPU device is properly installed and configured. Check the driver level to be sure it is current or newer than the driver version supported for your particular device.

Note: On Windows, the use of Remote Desktop may disable the use of a GPU device. Launching Ansys Polyflow through the ANSYS Remote Solve Manager (RSM) when RSM is installed as a service may also disable the use of a GPU. In these two scenarios, the GPU accelerator capability cannot be used. Using the TCC (Tesla Compute Cluster) driver mode, if applicable, can circumvent this restriction.

No Valid Devices

A GPU device was detected, but it is not a supported GPU device. Be sure that a supported GPU device is properly installed and configured. Check the driver level to be sure it is current or newer than the supported driver version for your particular device.

Poor Acceleration or No Acceleration

Simulation includes non-supported features

A GPU device will only accelerate certain portions of a simulation, mainly the solution time. If the bulk of the simulation time is spent outside of solution (for example, computing postprocessors), the GPU cannot have a significant impact on the overall analysis time. Even if the bulk of the simulation is spent on the solution, you must be sure that a supported equation solver is used during solution and that no unsupported options are used. Messages are printed in the output to alert you when a GPU is being used, as well as when unsupported options / features are chosen that deactivate the GPU accelerator capability (see Messages).

Simulation has too few DOF (degrees of freedom)

Some analyses (such as transient analyses) may require long compute times, not because the number of degrees of freedom is large, but because a large number of calculations are performed (that is, a very large number of time steps). Generally, if the number of degrees of freedom is relatively small, GPU acceleration will not significantly decrease the solution time. Consequently, for small models with many time steps, GPU acceleration may be poor because the model size is too small to fully utilize a GPU.

If there are too few degrees of freedom (for example, an inflow problem, a shell blow molding simulation, or a thermoforming simulation), the number of active matrices treated by the GPU is zero:

   Active matrices on CPU       :        1797    
   Active matrices on GPU[ 0]   :           0

Simulation does not fully utilize the GPU

Only simulations that spend a lot of time performing calculations that are supported on a GPU can expect to see significant speed-ups when a GPU is used. Only certain computations are supported for GPU acceleration. Therefore, you should check to ensure that a high percentage of the solution time was spent performing computations that could possibly be accelerated on a GPU. The problem must be large enough in terms of the number of degrees of freedom and the average frontal width. See the example related to the solver for details.

However, even if the problem is large enough, if the active matrix is too large with respect to the memory available on the GPU device, it will be treated on the CPU. This reduces the efficiency of the GPU usage. In this case, the following message is printed for the iteration where this occurs:

    Some allocations of memory for the GPU failed, so the calculation will continue using a CPU.

This allocation failure can occur because the matrix is really large with respect to the maximum of memory available on the GPU, or because the memory available on the GPU is reduced because another user is already using the GPU.

Oversubscribing GPU hardware

The program automatically determines which GPU device to use. In a multi-user environment, this could mean that one of the same GPUs is picked when multiple simulations are run simultaneously, thus oversubscribing the hardware.

If only a single GPU accelerator device exists in the machine, then only a single user should attempt to make use of it, much in the same way you should avoid oversubscribing your CPU cores.
If multiple GPU accelerator devices exist in the machine, you can set the ANSGPU_DEVICE environment variable, in conjunction with the ANSGPU_PRINTDEVICES environment variable mentioned previously, to specify which particular GPU accelerator device to use during the solution.
For example, consider a scenario where ANSGPU_PRINTDEVICES shows that four GPU devices are available with device ID values of 1, 3, 5, and 7, respectively, and only the second and third devices are supported for GPU acceleration. To select only the second supported GPU device, set ANSGPU_DEVICE = 5.