Understanding Cluster Node States
You can view the state of cluster nodes in two different ways:
- In Ansys Gateway powered by AWS: Display the autoscaling cluster details and selecting a queue to view. The states generated here are based on specific states retrieved from Slurm.
- Using the command line: Issue Slurm commands like sinfo and scontrol to get information directly from Slurm. Node states in Slurm are more specific and provide a finer level of detail.
The node states reported in Ansys Gateway powered by AWS are not the same as the node states reported when using Slurm commands. Ansys Gateway powered by AWS uses a small set of general terms to describe node states, with each state being based on one or more node states reported by Slurm. For example, a node reported as Running in Ansys Gateway powered by AWS may be either IDLE, ALLOCATED, or MIXED in Slurm.
This topic describes the common node states reported for each method and helps you understand how Slurm node states correlate to the node states reported in Ansys Gateway powered by AWS.
Node States in Ansys Gateway powered by AWS
As described in Viewing the Details of an Autoscaling Cluster Queue, the state of individual cluster nodes is displayed in the queue details.
The node states reported in the queue details are based on states retrieved from the
State
property in the following Slurm command:
scontrol show node $node
Tab in Queue Details | State in Ansys Gateway powered by AWS | Description | Slurm node state |
---|---|---|---|
Nodes | Starting | The node is being launched | POWERING_UP |
Running | The node is ready to do work or is actively doing work |
Either:
|
|
Terminating | The node is being shut down | POWERING_DOWN | |
Inactive nodes | Terminated | The node has been shut down | POWERED_DOWN |
Node States in Slurm
When using Slurm commands like sinfo and scontrol, Slurm reports precise node states as described below.
State | Description |
---|---|
ALLOCATED | The node has been allocated to one or more jobs |
CLOUD | The node is a cloud node that is not currently running but can be brought up on demand |
IDLE | The node is not allocated to any jobs and is available for use |
MIXED | The node is in multiple states. For example, some of its CPUs may be ALLOCATED while others are IDLE. |
POWERED_DOWN | The node is powered down and not capable of running jobs |
POWERED_UP | The node is powered up and can run jobs |
POWERING_DOWN | The node is in the process of powering down and not capable of running jobs |
POWERING_UP | The node is in the process of powering up |
For a complete list of Slurm node states, refer to the following in the Slurm documentation:
- scontrol: NODES - SPECIFICATIONS FOR SHOW COMMAND
- sinfo: NODE STATE CODES