2. Setting Up Custom Code References

2.1. Logging On to the Cluster Head Node

The scripts that you will be customizing should be located on the cluster head node. All further actions will be performed on the cluster installation.


Note:  If this is not your configuration as stated in Before You Begin, then this scripting method could fail. This method ensures that all users use the same scripts. A method for applying different scripts for different groups is also allowed, but not covered in this tutorial and is not the preferred method.


2.2. Making a Copy of Supported Cluster Files

First, you should determine what supported cluster your custom cluster is most like. Most installations of custom clusters will actually be wrappers around standard LSF, PBS, UGE, or MS HPC clusters that require some additional commands or modified command lines to run.

You will need to find the files for the supported cluster type that is most like the cluster you are customizing. In this example, our cluster is actually a UGE (SGE) cluster whose Submit and Cancel behavior we are customizing, so we will start from the SGE version of the scripts (denoted by "_SGE" in their names). If you have a truly custom cluster that is not related to any of the supported clusters, you can start from any of these types, using the code merely as a guide.

Next, you will need to create a keyword (short word or phrase) that represents your custom cluster type. This keyword will be appended to some filenames, so try to keep it simple. For this example we will use SHEF01.

  1. Navigate to [Ansys 2024 R2 INSTALL]/RSM/Config/xml. This directory contains the base scripts for all of the supported cluster types.

  2. Make a copy of hpc_commands_SGE.xml and call the copy hpc_commands_SHEF01.xml, replacing SHEF01 with your own keyword.

2.3. Customizing the Code to Include the Desired Changes

2.3.1. Modifying the Job Configuration File for the New Cluster Type

As part of the setup, you must add an entry for your custom cluster keyword in the jobConfiguration.xml file, and reference the HPC commands file that is needed for this cluster job type.

  1. Navigate to [Ansys 2024 R2 Install]/RSM/Config/xml.

  2. Open the jobConfiguration.xml file and add an entry for your custom cluster job type. The sample entry below is for the SHEF01 keyword that we established earlier, and points to the custom hpc_commands_SHEF01.xml file. Use your own keyword and HPC commands file name where appropriate.

    <keyword name="SHEF01">
      <hpcCommands name="hpc_commands_SHEF01.xml">
      </hpcCommands>
    </keyword>
    

2.3.2. Modifying the Custom HPC Commands File to Reference Custom Scripts

As part of the setup, you must edit the cluster-specific HPC Commands file provided with the RSM installation.


Note:  Commands files for different cluster types are sometimes very different, so this may not look like yours if you have started from LSF or PBS scripts. However, the sections should be similarly named, even if the actual commands are different than UGE/SGE.


Below is an example of an unmodified HPC Commands file. It is followed by instructions on how to modify it.

<?xml version="1.0" encoding="utf-8"?>
<jobCommands version="3" name="Custom Cluster Commands">
  <environment>
    <env name="RSM_HPC_PARSE_MARKER">START</env>
  </environment>
  <submit>
    <precommands>
      <command name="memory">
        <application>
          <pythonapp>%RSM_HPC_SCRIPTS_DIRECTORY%/sgeMemory.py</pythonapp>
        </application>
        <condition>
          <env name="RSM_HPC_MEMORY">ANY_VALUE</env>
        </condition>
      </command>
    </precommands>
    <primaryCommand name="submit">
      <application>
        <app>qsub</app>
      </application>
      <arguments>
        <arg>
          <value>-q %RSM_HPC_QUEUE%</value>
          <condition>
            <env name="RSM_HPC_QUEUE">ANY_VALUE</env>    <!-- if not set, -q in RSM_HPC_NATIVEOPTIONS -->
          </condition>
        </arg>
        <arg>
          <value>-pe %RSM_HPC_SGE_PE% %RSM_HPC_CORES%</value>
          <condition>
            <env name="RSM_HPC_SGE_PE">ANY_VALUE</env>   <!-- if not set, -pe in RSM_HPC_NATIVEOPTIONS -->
          </condition>
        </arg>
        <arg>
          <value>-l mem_free=%RSM_HPC_MEMORY%M</value>
          <condition>
            <env name="RSM_HPC_MEMORY">ANY_VALUE</env>
          </condition>
        </arg>
        <arg>
          <value>-l exclusive</value>
          <condition>
            <env name="RSM_HPC_NODE_EXCLUSIVE">TRUE</env>
          </condition>
        </arg>
        <arg>%RSM_HPC_NATIVEOPTIONS% -S /bin/sh -V -R y -N "%RSM_HPC_JOBNAME%" -o "%RSM_HPC_STAGING%/%RSM_HPC_STDOUTFILE%" -e "%RSM_HPC_STAGING%/%RSM_HPC_STDERRFILE%" "%RSM_HPC_STAGING%/%RSM_HPC_COMMAND%"</arg>
      </arguments>
    </primaryCommand>
    <postcommands>
      <command name="parseSubmit">
        <properties>
          <property name="MustRemainLocal">true</property>
        </properties>
        <application>
          <pythonapp>%RSM_HPC_SCRIPTS_DIRECTORY_LOCAL%/ugeParsing.py</pythonapp>
        </application>
        <arguments>
          <arg>-submit</arg>
          <arg>
            <value>%RSM_HPC_PARSE_MARKER%</value>
            <condition>
              <env name="RSM_HPC_PARSE_MARKER">ANY_VALUE</env>
            </condition>
          </arg>
        </arguments>
        <outputs>
          <variableName>RSM_HPC_OUTPUT_JOBID</variableName>
        </outputs>
      </command>
    </postcommands>
  </submit>
  <cancel>
    <primaryCommand name="cancel">
      <application>
        <app>qdel</app>
      </application>
      <arguments>
        <arg>%RSM_HPC_JOBID%</arg>
      </arguments>
    </primaryCommand>
  </cancel>
  <queryStatus>
    <primaryCommand name="queryStatus">
      <application>
        <app>qstat</app>
      </application>
      <arguments>
        <arg>-u %RSM_HPC_USER%</arg>
        <arg noSpaceOnAppend="true">
          <value>,%RSM_HPC_PROTOCOL_OPTION1%</value>
          <condition>
              <env name="RSM_HPC_PROTOCOL_OPTION1">ANY_VALUE</env>
          </condition>
        </arg>
      </arguments>
    </primaryCommand>
    <postcommands>
      <command name="parseStatus">
        <properties>
          <property name="MustRemainLocal">true</property>
        </properties>
        <application>
          <pythonapp>%RSM_HPC_SCRIPTS_DIRECTORY_LOCAL%/ugeParsing.py</pythonapp>
        </application>
        <arguments>
          <arg>-status</arg>
          <arg>
            <value>%RSM_HPC_PARSE_MARKER%</value>
            <condition>
              <env name="RSM_HPC_PARSE_MARKER">ANY_VALUE</env>
            </condition>
          </arg>
        </arguments>
        <outputs>
          <variableName>RSM_HPC_OUTPUT_STATUS</variableName>
        </outputs>
      </command>
    </postcommands>
  </queryStatus>
  <queryQueues>
    <primaryCommand name="queryQueues">
      <application>
        <app>qconf</app>
      </application>
      <arguments>
        <arg>-sql</arg>
      </arguments>
    </primaryCommand>
    <postcommands>
      <command name="checkQueueExists">
        <properties>
          <property name="MustRemainLocal">true</property>
        </properties>
        <application>
          <pythonapp>%RSM_HPC_SCRIPTS_DIRECTORY_LOCAL%/ugeParsing.py</pythonapp>
        </application>
        <arguments>
          <arg>-queues</arg>
          <arg>
            <value>%RSM_HPC_PARSE_MARKER%</value>
            <condition>
              <env name="RSM_HPC_PARSE_MARKER">ANY_VALUE</env>
            </condition>
          </arg>
        </arguments>
        <outputs>
          <variableName>RSM_HPC_OUTPUT_QUEUE_DEFINED</variableName>
        </outputs>
      </command>
    </postcommands>
  </queryQueues>
  <getAllQueues>
    <primaryCommand name="getAllQueues">
      <application>
        <app>qconf</app>
      </application>
      <arguments>
        <arg>-sql</arg>
      </arguments>
    </primaryCommand>
    <postcommands>
      <command name="parseQueueList">
        <properties>
          <property name="MustRemainLocal">true</property>
        </properties>
        <application>
          <pythonapp>%RSM_HPC_SCRIPTS_DIRECTORY_LOCAL%/ugeParsing.py</pythonapp>
        </application>
        <arguments>
          <arg>-allqueues</arg>
          <arg>
            <value>%RSM_HPC_PARSE_MARKER%</value>
            <condition>
              <env name="RSM_HPC_PARSE_MARKER">ANY_VALUE</env>
            </condition>
          </arg>
        </arguments>
        <outputs>
          <variableName>RSM_HPC_OUTPUT_GENERIC</variableName>
        </outputs>
      </command>
    </postcommands>
  </getAllQueues>
  <queryPe>
    <primaryCommand name="queryPe">
      <application>
        <app>qconf</app>
      </application>
      <arguments>
        <arg>-spl</arg>
      </arguments>
    </primaryCommand>
    <postcommands>
      <command name="checkPeExists">
        <properties>
          <property name="MustRemainLocal">true</property>
        </properties>
        <application>
          <pythonapp>%RSM_HPC_SCRIPTS_DIRECTORY_LOCAL%/ugeParsing.py</pythonapp>
        </application>
        <arguments>
          <arg>-pe</arg>
          <arg>
            <value>%RSM_HPC_PARSE_MARKER%</value>
            <condition>
              <env name="RSM_HPC_PARSE_MARKER">ANY_VALUE</env>
            </condition>
          </arg>
        </arguments>
        <outputs>
          <variableName>RSM_HPC_OUTPUT_PE_DEFINED</variableName>
        </outputs>
      </command>
    </postcommands>
  </queryPe>
  <!--<queryQacct>
    <primaryCommand name="queryQacct">
      <application>
        <app>qacct</app>
      </application>
      <arguments>
        <arg>-j %RSM_HPC_JOBID%</arg>
      </arguments>
    </primaryCommand>
    <postcommands>
      <command name="parseQacct">
        <properties>
          <property name="MustRemainLocal">true</property>
        </properties>
        <application>
          <pythonapp>%RSM_HPC_SCRIPTS_DIRECTORY_LOCAL%/ugeParsing.py</pythonapp>
        </application>
        <arguments>
          <arg>-qacct</arg>
          <arg>
            <value>%RSM_HPC_PARSE_MARKER%</value>
            <condition>
              <env name="RSM_HPC_PARSE_MARKER">ANY_VALUE</env>
            </condition>
          </arg>
        </arguments>
        <outputs>
          <variableName>RSM_HPC_OUTPUT_QACCT</variableName>
        </outputs>
      </command>
    </postcommands>
  </queryQacct>-->
  <getAllStatus>
    <primaryCommand name="getAllStatus">
      <application>
        <app>
          <value>qstat</value>
        </app>
      </application>
    </primaryCommand>
    <postcommands>
      <command name="parseStatus">
        <properties>
          <property name="MustRemainLocal">true</property>
        </properties>
        <application>
          <pythonapp>%RSM_HPC_SCRIPTS_DIRECTORY_LOCAL%/ugeParsing.py</pythonapp>
        </application>
        <arguments>
          <arg>-allstatus</arg>
          <arg>
            <value>%RSM_HPC_PARSE_MARKER%</value>
            <condition>
              <env name="RSM_HPC_PARSE_MARKER">ANY_VALUE</env>
            </condition>
          </arg>
        </arguments>
        <outputs>
          <variableName>RSM_HPC_OUTPUT_GENERIC</variableName>
        </outputs>
      </command>
    </postcommands>
  </getAllStatus>
</jobCommands>

In this example we will be customizing the Submit and Cancel behavior of a UGE cluster.

In the HPC Commands file shown above, you need to do two things:

  1. Replace all of the Submit command, between <primaryCommand name ="submit"> and </primaryCommand>, with the new (much shorter) code reference to the %RSM_HPC_SCRIPTS_DIRECTORY%/CustomSubmissionCode.py as shown below in bold.

      <submit>
        <precommands>
          <command name="memory">
            <application>
              <pythonapp>%RSM_HPC_SCRIPTS_DIRECTORY%/sgeMemory.py</pythonapp>
            </application>
            <condition>
              <env name="RSM_HPC_MEMORY">ANY_VALUE</env>
            </condition>
          </command>
        </precommands>
        <primaryCommand name="submit">
          <application>
            <pythonapp>%RSM_HPC_SCRIPTS_DIRECTORY%/CustomSubmissionCode.py</pythonapp>
          </application>
        </primaryCommand>
        <postcommands>
          <command name="parseSubmit">
            <properties>
              <property name="MustRemainLocal">true</property>
            </properties>
            <application>
              <pythonapp>%RSM_HPC_SCRIPTS_DIRECTORY_LOCAL%/ugeParsing.py</pythonapp>
            </application>
            <arguments>
              <arg>-submit</arg>
              <arg>
                <value>%RSM_HPC_PARSE_MARKER%</value>
                <condition>
                  <env name="RSM_HPC_PARSE_MARKER">ANY_VALUE</env>
                </condition>
              </arg>
            </arguments>
            <outputs>
              <variableName>RSM_HPC_OUTPUT_JOBID</variableName>
            </outputs>
          </command>
        </postcommands>
      </submit>
  2. Replace all of the Cancel command, between <primaryCommand name ="cancel"> and </primaryCommand>, with the new code reference to the %RSM_HPC_SCRIPTS_DIRECTORY%/CustomCancelCode.py as shown below in bold.

      <cancel>
        <primaryCommand name="cancel">
          <application>
            <pythonapp>%RSM_HPC_SCRIPTS_DIRECTORY%/CustomCancelCode.py</pythonapp>
          </application>
          <arguments>
          </arguments>
        </primaryCommand>
      </cancel>

Note:
  • Replacing the references to this code here means that when RSM needs to submit a job or cancel a job, it will now use this new code to do so. Changes made to these scripts/code will be immediately implemented into RSM.

  • If you want to use other types of code such as C++, that is acceptable if you simply place your compiled (executable) code in the <app> </app> section, arguments are not required. For Python, an interpreter is included in the Ansys Workbench install, so that is what you see referenced. If you want to use Python you can simply replace <app> </app> with <pythonapp> </pythonapp> as shown and enter the Python code file name.

  • Any custom code that you want to provide as part of the customization should also be located in the [Ansys 2024 R2 INSTALL]/RSM/Config/scripts directory of the submit host or head node. Alternatively, you must enter a full path to the script along with the name.


2.4. Modifying Scripts to Add Extra Functionality

The scripts used for the Submit Example and Cancel Example below can be found in this directory on the cluster submit host: [Ansys 2024 R2 INSTALL]/RSM/Config/scripts/EXAMPLES. To follow along with this tutorial, copy the CustomSubmissionCode.py and CustomCancelCode.py scripts to [Ansys 2024 R2 INSTALL]/RSM/Config/scripts and customize them as you want. Or substitute your own scripts.

2.4.1. Submit Example

This UGE Submit Python code is shown below and has been commented for instruction. Comments are denoted by the # symbol and are shown in bold.

"""
Copyright (C) 2015 ANSYS, Inc. and its subsidiaries.  All Rights Reserved.

$LastChangedDate:$
$LastChangedRevision:$
$LastChangedBy:$
"""

import sys
import ansLocale
import os
import tempfile
import os.path
import shutil
import glob
import shlex
import subprocess
import time
import platform
print('RSM_HPC_DEBUG=Debug Statements need to be turned on in the rsm job window')
# See Below #1
print('RSM_HPC_WARN=This is what a warning displays like')
print('RSM_HPC_ERROR=This is what an error message look like')
print('Standard output looks like this, you dont need the special RSM tags')
print('End custom coding')

# See Below #2
# Code below is for Clusterjobs submission to standard SGE cluster.  
# The string variable _ClusterjobsSubmit is recursively appended
# to incorporate all the variables that "might" exist from RSM.
# The variable will store the entire job submit command.
_ClusterjobsSubmit = "qsub -S /bin/sh -V -R y"

# See Below #3
# Check that the Jobname Exists, if so Add it to the command line.
_jobname = os.getenv("RSM_HPC_JOBNAME")
if not _jobname == None:
    _ClusterjobsSubmit += " -N \\\"" + _jobname + "\\\""

# Check if job is being submitted from a Queue folder.  
# If so, then add it to the command line.
_queue = os.getenv("RSM_HPC_QUEUE")
if not _queue == None:
    _ClusterjobsSubmit += " -q " + _queue

# Define the parallel environment names.
_SharedMemoryEnvironmentName = 'pe_smp'
_DistributedMemoryEnvironmentName = 'pe_mpi'

# Number of cores should always be defined from RSM code, but check anyway.  
# Check if job is distributed and choose environment type accordingly.
_numcores = os.getenv("RSM_HPC_CORES")
_distributed = os.getenv("RSM_HPC_DISTRIBUTED")
if not _numcores == None:
    if _distributed == None or _distributed == "FALSE":
        _ClusterjobsSubmit += " -pe " + _SharedMemoryEnvironmentName  + " " + _numcores
    else:
        _ClusterjobsSubmit += " -pe " + _DistributedMemoryEnvironmentName + " " + _numcores
_nativeOptions = os.getenv("RSM_HPC_NATIVEOPTIONS")
if not _nativeOptions == None:
    _ClusterjobsSubmit += " " + _nativeOptions

# Check if the Staging directory exists. If not, then error log out but don't exit.  
# If so, add it as the qsub working directory.
_staging = os.getenv("RSM_HPC_STAGING")
if _staging == None:
    print("RSM_HPC_ERROR=RSM_HPC_STAGING is not defined, please define and Restart RSM Services")
else:
    _ClusterjobsSubmit += " -wd " + _staging

# Check to see if Stdout file and stderr files are defined.  
# If so, add them to the command line as  well.
_stdoutfile = os.getenv("RSM_HPC_STDOUTFILE")
if not _stdoutfile == None:
    _ClusterjobsSubmit += " -o " +  _stdoutfile
_stderrfile = os.getenv("RSM_HPC_STDERRFILE")
if not _stderrfile == None:
    _ClusterjobsSubmit += " -e " + _stderrfile

# Debugging to see exact commands before variable expansion.
print('RSM_HPC_DEBUG=Cluster Jobs Submit Command before expansion: ' + _ClusterjobsSubmit);
_ClusterjobsSubmit = os.path.expandvars(_ClusterjobsSubmit);

# See Below #4
# Don't want to expand RSM_HPC_COMMAND since $AWP_ROOTxxx needs to be expanded later, on the cluster.  
_qsubCommand = os.getenv("RSM_HPC_COMMAND")
if not _qsubCommand == None:
    _ClusterjobsSubmit += " " + _qsubCommand

# Split the string into a list of strings that Subprocess.Popen can read.
_argList = shlex.split(_ClusterjobsSubmit)

# Debugging to see exact commands to run.
print('RSM_HPC_DEBUG=_ClusterjobsSubmit final split arguments: ' + str(_argList))

# Printing START tells RSM that all output above was just junk, 
# i.e. dont try to find SGE Submit output above here.  IF this is not printed, RSM assumes 
# Start is at the TOP of the file and tries to interpret everything.
print('START')

# See Below #5
# Run the command we created.
process = subprocess.Popen(_argList, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, cwd=os.getcwd())
# Wait for the command to finish.
try:
    while process.poll() == None:
        time.sleep(1)
except:
        pass
print("RSM_HPC_DEBUG=qsub command finished")

# See Below #6
# Just dump out the standard output to Print. RSM should be able to interpret SGE output 
# exactly as long as the correct parser is selected in <parseSubmit> in the HPC commands file.
for line in process.stdout:
    print line
# Job is finished with no errors; exit 0 means everything is fine.
sys.exit(0)

Note:  This code references many RSM-set environment variables. For more information on what environment variables are available and their contents, see Environment Variables Set by RSM in the Remote Solve Manager User's Guide.


  1. You can add any code you want to this section, code placed here will execute before the job is submitted. Also, you can stop the job from submitting with some controls on the Submit command, if desired.

  2. Basic SGE command line starting point. We will continuously append arguments to this line as necessary to complete the command.

  3. Most blocks are composed of three parts: storing an environment variable to a local variable, testing to ensure that a variable either isn’t empty or contains a special value, and then appending some flag to the command line based on the findings.

  4. One of the final actions is to read the RSM_HPC_COMMAND variable and append it to the submission command. This command is created by RSM and contains the command line to run the ClusterJobs script that can complete the submission process. It creates the full command line for Ansys by using the controls file created by the individual add-ins. Ansys suggests that you always use the RSM_HPC_COMMAND to submit a job whenever possible because of the complexities of the Ansys command line for different solvers and on different platforms.

  5. Popen finally "runs" the command we have been building. Then we wait for it to finish.

  6. Finally, print any output that came from it so RSM can interpret it and obtain the job #.

Since this script is a Submit script, there are many options for qsub command. However, it is much simpler to create a custom script for the Cancel command, although it contains the same basic parts. This process is addressed in the next section.

2.4.2. Cancel Example

This UGE Cancel Python code is shown below and has been commented for instruction. Comments are denoted by the # symbol and are shown in bold.

"""
Copyright (C) 2015 ANSYS, Inc. and its subsidiaries.  All Rights Reserved.

$LastChangedDate:$
$LastChangedRevision:$
$LastChangedBy:$
"""

import sys
import ansLocale
import os
import tempfile
import os.path
import shutil
import glob
import shlex
import subprocess
import time
import platform
print('RSM_HPC_DEBUG=Custom Cancel command running')
print('Begin Custom Coding')
# See Below #1
print('RSM_HPC_WARN=Warning test')
print('RSM_HPC_ERROR=Error Test')
print('End custom coding')

# See Below #2
# Code below is for cancelling a job on a standard SGE cluster.  The variable _SGEjobsCancel is
# recursively added to incorporate any needed variables from RSM.
# These can be modified in any way necessary.
_SGEjobsCancel = "qdel"

# See Below #3
# Check if the jobid exists.  If not, then error log out and exit failed.  
# If so, add it with a space to the qdel command as its argument.
_jobid = os.getenv("RSM_HPC_JOBID")
if _jobid == None or _jobid == ' ':
    print("RSM_HPC_ERROR=RSM_HPC_JOBID is not defined, There has been an error in the job submission")   
    sys.exit(1)
else:
    _SGEjobsCancel += " " + _jobid

# Split the string into a list of strings that Subprocess.Popen can read.
_argList = shlex.split(_SGEjobsCancel)

# Debugging to see exact commands to run.
print('RSM_HPC_DEBUG=_SGEjobsCancel final split arguments: ' + str(_argList))

# Printing START tells RSM that all output above was just junk, 
# i.e. dont try to find SGE Cancel output above here.  IF this is not printed, RSM assumes 
# Start is at the TOP of the file and tries to interpret everything.
print('START')

# See Below #4
# Run the command we created.
process = subprocess.Popen(_argList, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, 
cwd=os.getcwd())         
# Wait for the command to finish.
try:
    while process.poll() == None:
        time.sleep(1)
except:
    pass
print("RSM_HPC_DEBUG=cancel command finished, printing output")

# See Below #5
# Just dump out the standard output to Print.
for line in process.stdout:
    print line
# Script is finished with no errors; exit 0 means everything is fine.
sys.exit(0)

Note:  This code references many RSM-set environment variables. For more information on what environment variables are available and their contents, see Custom Integration Environment Variables in the Remote Solve Manager User's Guide.


  1. You can add any code you want to this section; code placed here will happen before the job is cancelled. Also, some code could be run at the end of the script just before sys.exit(0), if some extra precautions are to be taken after the job has been cancelled through the scheduler.

  2. Basic SGE command line starting point: qdel is what you would type at the command line in order to cancel a job in SGE. We will continuously append arguments to this line as necessary to complete the command.

  3. Most blocks are composed of three parts: storing an environment variable to a local variable, testing to ensure that a variable isn’t empty, and then appending some flag to the command line (or stopping the command if an error is found) based on the findings. This environment variable is set by RSM. A list of these useful variables can be found in Custom Integration Environment Variables in the Remote Solve Manager User's Guide.

  4. Popen finally "runs" the command we have been building. Then we wait for it to finish.

  5. Finally, print any output that came from it so RSM can interpret it if needed.