B.2.1. Supported MPI Versions

Ansys Forte uses a specific Intel MPI version or versions on Windows systems and on Linux systems. The supported versions are detailed at the Platform Support page on the Ansys website. Note that some versions of MPI are compatible with certain versions of the UCX library and Infiniband. See the following Intel web pages for more details:

https://www.intel.com/content/www/us/en/developer/articles/technical/improve-performance-and-stability-with-intel-mpi-library-on-infiniband.html

and

https://www.intel.com/content/www/us/en/developer/articles/technical/mpi-compatibility-nvidia-mellanox-ofed-infiniband.html

Your I.T. department may need to help you determine if a specific Intel MPI version is compatible with your cluster. The subsequent sections of this Appendix may help you and your IT support diagnose and fix issues you have when running Forte on your cluster. If after following the advice and suggestions in these sections you are still having issues, please contact Ansys Support and send us the following information:

  • The OS version of your cluster

  • Details of the errors you are experiencing

  • Details of the workflow you are using to submit the job to the cluster, including all job submission scripts and job output and log files, as well as the job scheduler software and version

  • A copy of the run_env.sh and run_mpi.sh files for the Forte case you are attempting to run

  • The Forte MONITOR file

Please also add the following lines to the run_mpi.sh script before the mpirun command:

echo lspci | grep -i mellanox > mellanox.txt
ofed_info  >ofed_info.txt
ibstat  > ibstat.txt
ucx_info –v > ucx_v.txt
ucx_info –d > ucx_d.txt
which mpirun > mprun.txt
printenv > env.sh
ulimit -a > ulimit.txt
mpirun hostname > hostnames.txt

and resubmit your run that calls run_mpi.sh and then send us the resulting .txt files.

Sometimes the incorrect Intel MPI fabric may be loaded by MPI on your cluster. To check this, add:

export I_MPI_DEBUG="1000"

to your run_env.sh and re-submit the job and send us the MONITOR file so we can confirm which fabric is being loaded.

Your IT Support may also run the Intel cluster checker:

https://www.intel.com/content/www/us/en/docs/cluster-checker/user-guide/2021-7-2/getting-started.html

and follow up directly with Intel if the checker reports potential issues on your cluster and ask Intel for advice.