Large Scale DSO Deployment/Configuration
Linux Cluster Configuration
- Shared drive for projects: The cluster must provide a shared drive that hosts job inputs - the submitted project must be located on a shared drive (for example, a subfolder of user’s home directory). The shared drive must be accessible using the same path on every node of cluster.
- Temp directory configuration: Temp directory is either on ‘local storage’ or on storage that has equivalent speed characteristics, that is, the I/O rates of the storage should be invariant to network traffic.
- This storage is freed at the end of the analysis.
- The amount of required space depends on the number of engines per node and the cumulative variations solved on this node.
- The amount of required space depends on the project’s compression options. For example, if ‘Save Fields’ of a parametric setup is OFF, the space requirement is smaller by the amount of space taken up by field solution data.
- Ansys Electromagnetics RSM environment: In the case of supported scheduler environments, there is no extra configuration needed. In the case of RSM environment, the following additional steps are needed:
- RSM must be running on all the nodes of cluster. The credentials of ‘RSM service’ allow read/write to the shared drive because the remote engine processes are launched using the credentials of RSM service.
- Registration of 'desktopjob.exe' with RSM service: ‘desktopjob’ program must be registered with Ansys Electromagnetics RSM using 'desktopjob -regserver". To ensure that the registration is successful, check that the ‘desktopjob’ entry in '<RSM-installation-folder>/AnsoftRSMService.cfg' file is valid.
Make sure the Temp directory on a host has sufficient space to hold results database for the variations that are solved on it.
Note:
Note: Linux-specific
critical note: Edit AnsoftRSMService.cfg and replace ‘desktopjob.bin’
with ‘desktopjob’
Warning: In the RSM environment, Large
Scale DSO can only be enabled for one product.
Tip: Troubleshooting (RSM environment only):
The “shared drive read/write” requirement is a new constraint introduced
in Large Scale DSO. So if a user runs into a situation where Regular
DSO jobs run and Large Scale DSO jobs fail, one possible cause for the
failure: RSM service does not have privileges to read and write to project
folder located on shared-drive.
Windows Cluster Configuration
All the above steps apply, except for steps that are stated as Linux-specific. Additional instructions:
- RSM and Ansys Electromagnetics products are either installed locally on each node of cluster (i.e. local installation) OR installed on a single shared-drive available to all nodes of cluster (i.e. network installation)
- Registration of 'desktopjob.exe' with RSM service:
- Network installation: desktopjob.exe is registered with RSM service once, on any of the nodes of cluster
- Local installation: Since each node has it's own RSM installation, desktopjob.exe must be registered with RSM on each node. Note
Warning: IMPORTANT! RSM service must be started using the credentials of a non-system 'admin'
account that has read/write permissions to project's shared drive.
If RSM service runs as 'system' user, large-scale DSO jobs will fail.
Heterogeneous Cluster Configuration
Limitation: Currently heterogeneous cluster (with both Linux and Windows nodes) is not supported. This is due to the shared drive requirement.
Related Topics
Large Scale DSO for Parametric Analysis