Using the Ansys EM HPC Diagnostics Tool

The Ansys EM HPC diagnostics tool simplifies HPC troubleshooting by automating diagnosis of routine issues. The diagnostics tool is run on the cluster as a scheduler managed job. Using its HTML-based diagnostics report, the cluster administrator or Ansys support staff can either resolve the issue, or guide the user with steps for further troubleshooting. In some cases, Ansys support staff may request to rerun the diagnostics with additional diagnostics tests. The user may extend the diagnostic scripts to suite their HPC environment.

The following sections describe how to use the diagnostics tool:

Supported Schedulers

The tool supports diagnosis of issues on Linux and windows clusters managed by the following schedulers:

For the above schedulers, the tool includes standard diagnostic scripts. Further, if password-ssh has been enabled, it also supports generic Linux clusters using ssh. Note that currently diagnostics tool does not support PBSPro and LSF/Windows.

Running the Diagnostics Job

The diagnostics are run as a scheduler managed job. Once the job finishes, you locate the resulting HTML file and provide it to the cluster administrator or to Ansys support staff. If there are any job or test failures, also provide the networking*.json files from the Hosts subdirectory as well.

Basic Diagnostic Job

To run the basic diagnostics, submit a diagnostic job to the scheduler using a provided job submission script. Each standard diagnostic job is a 12-core job with 4 cores per host. On Linux, running this script submits a scheduler job to run the diagnostic tool on the cluster. On Windows, you need to submit a job using a job file.

Basic scripts for each supported scheduler are available in the diagnostics subdirectory of the schedulers directory.

Linux:

.../Linux64/schedulers/diagnostics

Windows:

.../Win64/schedulers/diagnostics

Using diagnostics scripts on Linux Clusters

The following standard scripts are provided in the diagnostics directory:

.../Linux64/schedulers/diagnostics

These job submission scripts are scheduler specific.



Using Windows HPC Job File

A sample job file winhpctest.xml is available in the diagnostics directory:

.../Win64/schedulers/diagnostics

To submit this diagnostic job, you must change the job description to suite your environment as following:

  1. Select a directory for saving the diagnostic results. This directory must be accessible at the same path from all the hosts of the cluster.
  2. Locate the directory for Ansys EM installation. This directory also must be accessible at the same path from all the hosts of the cluster.
  3. Locate the winhpctest.xml in the diagnostics subdirectory of schedulers directory in the Ansys EM installation.
  4. Start Windows HPC job manager, and choose “New job from XML File…” action.
  5. Select the winhpctest.xml job file.
  6. Change the value of both the following environment variables with the directories located in the first two steps:

ANSYSEM_DIAG_PROD_DIR

ANSYSEM_DIAG_RESULTS_DIR

  1. Now submit the job.
Note: After making the above changes, you can also save the resulting XML file using Submit Job XML File…. Then you can submit the job using

job submit /jobfile: XMLfileName

Diagnostic Report

The diagnostic report is an HTML file that (along with other related diagnostics results) is placed in the following directory

Linux:

${HOME}/Ansoft/HPCDiag/Results/JOBID

Windows:

%ANSYSEM_DIAG_DIR%\Results\JOBID

Report file:

.../HTML/report.html

where JOBID is the job ID assigned by the scheduler. On Windows, the user must specify ANSSEM_DIAG_DIR directory.

Site-Specific Diagnostics Job

To run a diagnostic job with job submission parameters of your choice, you need to create your own job submission script. For example, you may want to specify a different LSF queue, or select a different SGE parallel environment. To run such a job, you need to create your own job submission script starting from the standard diagnostic scripts with the following steps:

  1. Locate the relevant standard diagnostic script in the diagnostics subdirectory of schedulers directory in Ansys EM installation.
  2. Make a copy of the diagnostics script into a directory that is accessible from a submit host for the cluster.
  3. Edit the script file to change the value of ANSYSEM_DIAG_PROD_DIR environment variable to point it to the installation directory (See below).
  4. Modify the job submission parameters as needed.
  5. Optionally, copy any site-specific diagnostic tests provided by Ansys support staff in the ../Custom directory of ANSYSEM_DIAG_RESULTS_DIR directory.
  6. Run the diagnostics script from a submit host for the cluster

Environment variables

The following environment variables are applicable for both Linux and Windows environment.

ANSYSEM_DIAG_PROD_DIR



ANSYSEM_DIAG_RESULTS_DIR



ANSYSEM_DIAG_CUSTOM_DIR

How the Diagnostic Tool Works

The diagnostics are run as a scheduler managed job. Running the diagnostic script submits a scheduler job that runs the diagnostic tool on the hosts allocated to the job. Once the diagnostic job starts, the tool executes a set of diagnostic tests. These tests run on each host allocated to the job, collecting diagnostic information relevant for running HPC jobs. The tool combines the diagnostic information to produce an HTML report. The tool saves the HTML diagnostic report and other results in a shared drive, which must be available at the same path from all the hosts of the cluster. On Linux, the default is //Ansoft/HPCDiag subdirectory under the user's home directory. On Windows, the user must specify this location using the ANSYSEM_DIAG_RESULTS_DIR environment variable.

Related Topics 

High-Performance Computing (HPC) Integration