4.9. Example: Setting Up a Multi-Node Ansys RSM Cluster (ARC)

The example provided in this section contains detailed, step-by-step instructions for setting up a multi-node Ansys RSM Cluster (ARC), and creating a configuration in RSM that enables users to submit jobs to this cluster.


Note:  Multi-node ARC configuration requires system administrator or root permission and should only be performed by an IT administrator.


Scenario

Cluster Nodes

There are 4 machines available for scheduling and running jobs. Their names and roles are described below.

  • ARCMASTER: This is the machine to which jobs will be submitted from users' client machines for scheduling. In other words, it is the cluster submit host, or master node.

    At a minimum this machine has Workbench and RSM installed, as well as the RSM launcher service (see Installing and Configuring the RSM Launcher Service for Windows).

    We will install the Master service on this machine.

  • EXECHOST1, EXECHOST2 and EXECHOST3: These are high-capacity machines on which jobs will run. In other words, they are execution nodes. They have Workbench, RSM, and Ansys solvers installed.

    On EXECHOST1 and EXECHOST2 we will restrict the number of cores that can be used by ARC jobs on those machines, and place no restrictions on EXECHOST3 so that it can handle larger jobs.

    We will install the Node service on each of these machines, and associate them with the ARCMASTER machine to essentially create a cluster.

Note that if we wanted to use ARCMASTER to run jobs as well, we would simply need to install the Node service on that machine. For this example, we will assume that only EXECHOST1, EXECHOST2 and EXECHOST3 will be used for running jobs.


Note:  All nodes in an Ansys RSM Cluster must run on the same platform (Windows or Linux). Instructions for both platforms are provided in this example.


Cluster Queues

The ARC will already have a local cluster queue for submitting jobs to the local machine, and a default cluster queue that can submit jobs to any of the execution nodes.

We are going to create a custom cluster queue named high_mem that will be dedicated to running jobs on EXECHOST3 only, which is the execution node with unrestricted resource allocation. We will also set the maximum number of jobs that can be run on this queue to 100.

RSM Configuration

Once we have set up the ARC cluster, we will use the RSM Configuration application to create a configuration named ARC. We will import the ARC cluster queues (local, local and high_mem) into the configuration and create RSM queues that map to these cluster queues.

Finally, we will make the ARC configuration available to users so that the RSM queues defined in the configuration appear in client applications on their machines, enabling them to submit jobs to the RSM queues (which map to the ARC cluster queues). Jobs will be sent to ARCMASTER, where the Master service will dispatch jobs to the execution nodes.

Below is an overview of the ARC setup that we will be creating:

Step 1: Configure Cluster Nodes and Queues

We will use the ARC Configuration application to connect our 4 machines together, install ARC services on cluster nodes, create a custom cluster queue, and cache credentials.

  1. Sign in to ARCMASTER (the master node) as an Administrator.

  2. On the master node, launch the ARC Configuration application as follows:

    • On Windows, select Start > Ansys 2024 R2 > ARC Configuration 2024 R2.You can also launch the application manually by double-clicking the following executable:

      [RSMInstall]\ARC\bin\arcConfigConsole.exe

    • On Linux, run the following script:

      <RSMInstall>/ARC/Config/tools/linux/arcconfigui

  3. On the Local Service Management page, set the Cluster Usage and Service Management options as follows:

    Remember that we are not using the master node for job execution (but could if we wanted to).

  4. Click Start to start the Master Node Service on the current machine.

  5. Once the Master Node Service is started, right-click Execution Nodes in the tree and select Add an execution node.

  6. In the ARC Configuration Manager dialog box, type exechost1, then click Add:

    A connection is made to the EXECHOST1 machine, and it is added to the node list.

  7. In the Execution Nodes list, select exechost1.

  8. Click Start to start the Execution Node Service on EXECHOST1.

  9. In the properties panel, set Max Cores to 4 to limit the number of cores that can be used by cluster jobs on EXECHOST1.

  10. Click Apply.

  11. In the tree, right-click Execution Nodes and select Add an execution node.

  12. In the ARC Configuration Manager dialog box, type exechost2, then click Add:

  13. In the Execution Nodes list, select exechost2.

  14. Click Start to start the Execution Node Service on EXECHOST2.

  15. In the properties panel, set Max Cores to 4.

  16. Click Apply.

  17. In the tree, right-click Execution Nodes and select Add an execution node.

  18. In the ARC Configuration Manager dialog box, type exechost3, then click Add:

  19. In the Execution Nodes list, select exechost3.

  20. Click Start to start the Execution Node Service on EXECHOST3.

  21. In the properties panel, delete the value in the Max Cores so that there are no restrictions on the number of cores that cluster jobs can use on EXECHOST3.

  22. Click Apply.

    Now that the cluster nodes have been established, we can create cluster queues. We will create a queue named high-mem. This queue will run jobs on EXECHOST3 only. This is the only machine with unrestricted resource allocation, making it ideal for larger jobs. We will also set the maximum number of jobs that can run on the high_mem queue to 100.

  23. In the tree, right-click Queues and select Add a queue.

  24. In the ARC Configuration Manager dialog box, type high-mem, then click Add:

  25. In the Queues list, select the new high-mem queue.

  26. In the properties panel, enable the queue and set the Max Concurrent Jobs to 100:

  27. Click Apply.

  28. In the tree, select the high-mem queue’s Access Control setting. Then, in the Node Assignments area, enable the exechost3 check box, and ensure that the other nodes are not checked:

    This specifies that jobs submitted to this queue will only run on EXECHOST3.

  29. Click Apply.

  30. In the tree, right-click Administrative Access and select Cache Password.

  31. Enter the password that you want RSM to use when the cluster is accessed with the current user account, then click Confirm:


    Note:  If we wanted RSM to use a different account to log in to the cluster, we would run the arccredentials command to cache the credentials for that account. For more information, see Caching Credentials for Cluster Job Submission (arccredentials).


  32. Close the ARC Configuration application.

At this point the Ansys RSM Cluster is set up and ready to receive job submissions from client machines.

Step 2: Create an ARC Configuration in RSM

In order for RSM client machines to be able to submit jobs to the ARC submit host (ARCMASTER), we must create a configuration in RSM that establishes communication between the client and submit host, specifies the file transfer method, and specifies RSM queues that map to the ARC cluster queues.

  1. Launch the RSM Configuration application by selecting Start > Ansys 2024 R2 > RSM Configuration 2024 R2.

  2. Click  , or right-click in the HPC Resources list and select Add HPC Resource.

  3. On the HPC Resource tab, specify a name for the configuration, select ARC for the HPC type, and specify the machine name of the cluster submit host.

  4. Click Apply, then select the File Management tab. Referring to Specifying File Management Properties, select the desired file transfer method, then specify the job execution working directory.

  5. Click Apply, then select the Queues tab.

  6. Click   to import the cluster queues that are defined on the ARCMASTER machine (default, local and high_mem).

  7. Since we are accessing ARCMASTER for the first time, we may be asked to enter the credentials that RSM will use to access that machine:

  8. Enter a User Name and Password that can be used to access ARCMASTER, then click OK. The specified credentials will be cached, enabling any user using this configuration to submit jobs to ARCMASTER.

  9. For each ARC Queue in the list, you can specify a unique RSM Queue name if you want (by default, the RSM queue name matches the cluster queue name). RSM queues are what users see in client applications when they choose to submit jobs to RSM. You can also choose which queues you want to enable for users, and submit a test job to each RSM queue by clicking Submit in the Test column.

  10. Click Apply to complete the configuration.

Step 3: Make the Configuration Available to Users

Since we named our configuration ARC, a file named ARC.rsmcc has been created in the RSM configuration directory. This directory also contains an RSM queue definition file named queues.rsmq. You will need to make these files available to users who will be submitting jobs to the cluster via RSM. You can do this by making the RSM configuration directory a shared directory, and instructing users to point their own RSM configuration directory setting to this shared directory. Alternatively, users can copy the RSM configuration files that you have created to the appropriate directory on their local machines.

With either of these options, if users were to launch the RSM Configuration application on their machines, they would see the ARC configuration automatically added to their HPC Resources list. They could then start submitting jobs to the RSM queues that were defined in this configuration. RSM queues are linked to the ARC configuration defined in RSM, which enables jobs to be submitted to ARCMASTER for scheduling.

There are two options for making the ARC.rsmcc and queues.rsmq files available to users:

Option 1: Share the RSM configuration directory

This method ensures that all users have the most accurate and up-to-date configuration information, as files are centrally stored and managed.

  • If you changed the RSM configuration directory to a share-friendly folder before creating configurations (as described in Creating a Shareable RSM Configuration Directory), you can go ahead and share that folder. Make sure that the folder has read-only permission to prevent others from modifying your configurations.

  • If you did not change the RSM configuration directory before creating cluster configurations, your configurations are located in the default configuration directory, which is a user-specific directory that is not suitable for sharing.

    In this case, follow these steps:

    Windows

    1. Create a folder in a location that is not associated with a user account (for example, C:\some\folder).

    2. Add the following required sub-folders to the folder: ANSYS\v242\RSM.

      In this example, the resulting path will be C:\some\folder\ANSYS\v242\RSM. This location will serve as the new configuration directory.

    3. If the RSM service is currently running, stop it. As an administrator, run net stop RSMLauncherService242.

    4. Open a command prompt in the [RSMInstall]\bin directory.

    5. Issue the following command, replacing the path with the desired value:

      rsm.exe appsettings set JobManagement ConfigurationDirectory C:\some\folder\ANSYS\v242\RSM

      You can specify a local path if the directory is on the local machine, or a UNC path if the directory is a network share.

    6. Go to the default configuration directory:

      %APPDATA%\ANSYS\v242\RSM

      The path to this directory might be C:\users\%username%\appdata\Roaming\Ansys\V242\RSM, where %username% is the name of the RSM or system administrator.

    7. Copy the ARC.rsmcc and queues.rsmq files to the new RSM configuration directory (for example, C:\some\folder\ANSYS\v242\RSM).

    8. Restart the RSM service.

    Linux

    1. Create a folder in a location that is not associated with a user account (for example, /some/folder).

    2. Add the following required sub-folders to the folder: ANSYS/v242/RSM.

      In this example, the resulting path will be /some/folder/ANSYS/v242/RSM. This location will serve as the new configuration directory.

    3. If the RSM service is currently running, stop it using rsmlauncher stop.

      If the RSM service is running as a daemon, stop it using ./etc/init.d/rsmlauncher242 stop.

    4. Run the rsmutils shell script located in the [RSMInstall]\Config\tools\linux directory. Issue the following command, replacing the path with the desired value:

      rsmutils appsettings set JobManagement ConfigurationDirectory /some/folder/ANSYS/v242/RSM

      You can specify a local path or a mounted file system depending on where the directory resides.

    5. Go to the default configuration directory:

      ~/.ansys/v242/RSM

      On Linux, ~ is the home directory of the account under which RSM is being run.

    6. Copy the ARC.rsmcc and queues.rsmq files to new configuration directory (for example, /some/folder/ANSYS/v242/RSM).

    7. Restart the RSM service.

Once the configuration directory has been shared, users should set the configuration directory on their local machines to the path of the shared configuration directory. For example, the share path might be something like \\machineName\Share\RSM for Windows users, or /clusternodemount/share/RSM for Linux users. They will follow the steps in Specifying the Location of the RSM Configuration Directory.


Note:  One potential drawback of this method is that users may not be able to access to the shared configurations if the host goes offline or cannot be accessed for some reason (for example, if a user is working off-site and does not have access to the network). In this case RSM will automatically switch the configuration directory back to the default configuration directory on their local machines. This means that users will, at a minimum, be able to submit jobs to ARC clusters already installed on their local machines using the localhost configuration that is generated in the default configuration directory when RSM is installed.


Option 2: Have users copy configuration files to their local machines

If you are a user looking to access configurations that have been defined by your RSM or system administrator, you can do so by setting your configuration directory to the shared configuration directory that was set by the administrator (see Option 1 above). Alternatively you can copy the configuration database to the appropriate directory on your machine.

As a user, you will need to:

  1. Obtain the ARC.rsmcc and queues.rsmq files from the RSM or system administrator. If the administrator has put the files in a shared directory that you can access, you can retrieve them directly from there.

  2. On your local machine, copy the files into your default configuration directory.

    By default, the directory in which the configurations are stored resolves to the following location:

    Windows: %APPDATA%\ANSYS\v242\RSM

    The path to this directory might be C:\users\%username%\appdata\Roaming\Ansys\v242\RSM, where %username% is the name of the RSM or system administrator.

    Linux: ~/.ansys/v242/RSM

    On Linux, ~ is the home directory of the account under which RSM is being run.


Note:  If any of the shared files that you are copying have the same name as files in your local configuration directory, you will need to rename your local files if you do not want them to be overwritten. For example, you may want to rename your localhost.rsmcc file to mylocalhost.rsmcc to distinguish it from the remote resource's localhost.rsmcc file, as its settings may be different.

Alternatively, to avoid this issue altogether:

  1. Create a new folder on your local machine (for example, C:\SharedRSMConfig).

  2. Add the following required sub-folders to the folder: ANSYS\v242\RSM.

    In this example, the resulting configuration directory will be C:\SharedRSMConfig\ANSYS\v242\RSM.

  3. Use the RSM Utilities application to set the JobManagement ConfigurationDirectory setting to the new folder. See Specifying the Location of the RSM Configuration Directory.

  4. Copy the configurations from the network share to your new configuration directory (for example, C:\SharedRSMConfig\ANSYS\v242\RSM).