Understanding Cluster Node States

You can view the state of cluster nodes in two different ways:

  • In Ansys Gateway powered by AWS: Display the autoscaling cluster details and selecting a queue to view. The states generated here are based on specific states retrieved from Slurm.
  • Using the command line: Issue Slurm commands like sinfo and scontrol to get information directly from Slurm. Node states in Slurm are more specific and provide a finer level of detail.

The node states reported in Ansys Gateway powered by AWS are not the same as the node states reported when using Slurm commands. Ansys Gateway powered by AWS uses a small set of general terms to describe node states, with each state being based on one or more node states reported by Slurm. For example, a node reported as Running in Ansys Gateway powered by AWS may be either IDLE, ALLOCATED, or MIXED in Slurm.

This topic describes the common node states reported for each method and helps you understand how Slurm node states correlate to the node states reported in Ansys Gateway powered by AWS.

Node States in Ansys Gateway powered by AWS

As described in Viewing the Details of an Autoscaling Cluster Queue, the state of individual cluster nodes is displayed in the queue details.

The node states reported in the queue details are based on states retrieved from the State property in the following Slurm command:

scontrol show node $node
Table 1. Node states in Ansys Gateway powered by AWS
Tab in Queue Details State in Ansys Gateway powered by AWS Description Slurm node state
Nodes Starting The node is being launched POWERING_UP
Running The node is ready to do work or is actively doing work

Either:

  • CLOUD+MIXED
  • CLOUD+IDLE
  • CLOUD+ALLOCATED
Terminating The node is being shut down POWERING_DOWN
Inactive nodes Terminated The node has been shut down POWERED_DOWN

Node States in Slurm

When using Slurm commands like sinfo and scontrol, Slurm reports precise node states as described below.

Table 2. Common node states in Slurm
State Description
ALLOCATED The node has been allocated to one or more jobs
CLOUD The node is a cloud node that is not currently running but can be brought up on demand
IDLE The node is not allocated to any jobs and is available for use
MIXED The node is in multiple states. For example, some of its CPUs may be ALLOCATED while others are IDLE.
POWERED_DOWN The node is powered down and not capable of running jobs
POWERED_UP The node is powered up and can run jobs
POWERING_DOWN The node is in the process of powering down and not capable of running jobs
POWERING_UP The node is in the process of powering up

For a complete list of Slurm node states, refer to the following in the Slurm documentation: