2.4. Monitoring/Manipulating Jobs

2.4.1. Monitoring the Progress of a Job

The squeue command is used to monitor the status of jobs in a Slurm partition. When no job ID is specified, the squeue command will display all jobs. If you use squeue -j job_ID you will receive information about a specific job. This information includes the location at which the job is executing, which will be listed under the NODELIST(REASON) column.

> squeue -f 524
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               524  partname   fluent-t   user  R       45:23    2 compute[00-01]

There are several job states to be aware of when using squeue, which are listed under the ST column. The two main states you will see are PD and R. PD indicates that the job is waiting in the queue to run. At this point the scheduler has not found a suitable node or nodes to run the job. Once the scheduler has found a suitable area to run the job and has sent the job to run, its state will be set to R.

2.4.2. Removing a Job from the Queue

The scancel <job_ID> command will delete the job indicated by <job_ID>.