Slurm Usage Guide
#
IntroductionSlurm is an open-source job scheduling system for Linux clusters, most frequently used for high-performance computing (HPC) applications. This guide will cover some of the basics to get started using slurm as a user. For more information, the Slurm Docs are a good place to start.
After slurm is deployed on a cluster, a slurmd daemon should be running on each compute system. Users do not log directly into each compute system to do their work. Instead, they execute slurm commands (ex: srun, sinfo, scancel, scontrol, etc) from the login node. These commands communicate with the slurmd daemons on each host to perform work.
#
Simple Commands#
Cluster state with sinfoTo "see" the cluster, ssh to apex-login.cmkl.ac.th
and run the sinfo
command:
Slurm nodes are grouped into partitions. In this case, the cluster has a single partition named batch
which is also the default partition, indicated by the *
symbol. There are 6 nodes available on this system, all in an idle
state. The timelimit 7-00:00:00 indicates that the maximum execution time limit for a job is 7 days. If a node is busy, its state will change from idle
to alloc
:
Nodes in mixed
state indicate that some of the node's CPUs are allocated, while others are available. Nodes in drain
state are not available for job scheduling, usually because the nodes are in maintenance or are allocated for kubernetes scheduler in a hybrid-cluster environment.
The sinfo
command can be used to output a lot more information about the cluster. Check out the sinfo doc for more.
#
Running a job with srunTo run a job, use the srun
command:
What happened here? With the srun
command we instructed slurm to find the first available node and run hostname
on it. It returned the result in our command prompt. It's just as easy to run a different command that runs a python script or a container using srun.
Most of the time, scheduling on a full system is not necessary and it's better to request only a certain portion of the GPUs:
Or, conversely, sometimes it's necessary to run on multiple CPUs:
#
Running an interactive jobEspecially when developing and experimenting, it's helpful to run an interactive job, which requests a resource and provides a command prompt as an interface to it:
During interactive mode, the resource is being reserved for use until the prompt is exited (as shown above). Commands can be run in succession.
Note: before starting an interactive session with srun it may be helpful to create a session on the login node with a tool like
tmux
orscreen
. This will prevent a user from losing interactive jobs if there is a network outage or the terminal is closed.
#
More Advanced Use#
Run a batch jobWhile the srun
command blocks any other execution in the terminal, sbatch
can be run to queue a job for execution once resources are available in the cluster. Also, a batch job will let you queue up several jobs that run as nodes become available. It's therefore good practice to encapsulate everything that needs to be run into a script and then execute with sbatch
vs with srun
:
You can observer your output in slurm-xxxxx.out
, replacing xxxxx
with your job id.
#
Observing running jobs with squeueTo see which jobs are running in the cluster, use the squeue
command:
To see just the running jobs for a particular user USERNAME
:
#
Cancel a job with scancelTo cancel a job, use the squeue
command to look up the JOBID and the scancel
command to cancel it:
#
Resource & Time LimitsThe following command launches an interactive task (bash
) for the job named gputest
which requires 2 GPUs, 20 CPUs, 20GB of memory. It also extends the job time limit (--time
) to 1 day from the 1 hour-default. The command uses the container image from local registry (note the #
used to separate the container registry server and path). It also mount your home directory to /userspace
instead of the default /root
directory.
#
Pyxis ContainerSlurm task can be warpped by container technology using enroot, singularity, and Pyxis. The following commands demonstrate the usages of Pyxis, which natively integrates to slurm.
The regular srun command will run the given command, grep
, on a bare metal compute node.
You can easily add the flag, container-image
, to run the given command, grep
, in the container on the compute node instead.
You can use pre-built images from Nvidia NGC, CMKL's local regisry, or your own docker images from Docker Hub.
In general, Pyxis will mount your home directory to /root
inside the container.
However, you can avoid that behavior by using --no-container-mount-home
and then selectively mount some specific directories using --container-mounts
.
Additionally, the command container-workdir
will set the container's entry directroy as /work1
.
You can also mount more than one directory using comma ,
as a separator.
More detail can be found in https://github.com/NVIDIA/pyxis.
#
Multi-GPUs trainingIn the following command, Slurm will request <NGPU>
for each of the <NNODE>
nodes. Total GPU requrested = <NGPU>*<NNODE>
.
In each node, the process, train.py
, will be initiated <NTASKS>
times. This <NTASK>
should be equal to <NGPU>
.
#
Additional ResourcesAcknowledgement: The content of this chapter has been adapted from the original DeepOps documentation