Skip to main content

Slurm Scheduler

NGC Container#

  1. Make sure that enroot uses https to connect to nvcr /etc/enroot/enroot.conf -> ENROOT_ALLOW_HTTP y

  2. Ensure that your ~/.config/enroot/credentials is populated with API-token (uses NGC to generate the token)

    # NVIDIA GPU CLOUD (both endpoints are required)
    machine [nvcr.io](http://nvcr.io/) login $oauthtoken password <Your API Key from NGC>
    machine [authn.nvidia.com](http://authn.nvidia.com/) login $oauthtoken password <Your API Key from NGC>
  3. Example srun command

    srun --container-image='[nvcr.io#nvidia/pytorch:21.04-py3](http://nvcr.io/#nvidia/pytorch:21.04-py3)' --gres=gpu:1 --pty nvidia-smi -L

https://github.com/NVIDIA/deepops/blob/master/docs/slurm-cluster/slurm-usage.md

Todo: slurm & MIGS support
[https://gitlab.com/nvidia/hpc/slurm-mig-discovery](https://gitlab.com/nvidia/hpc/slurm-mig-discovery)