To use GPUs on Mana a user needs to specify an additional option when compared to the basic srun (interactive) or sbatch (batch) job. By default, the scheduler will not allocate GPUs to a given job when the nodes are configured to be shared. This is to prevent the potential of other users on the same node from utilizing resources they did not request.

The additional option needed to request resources is the --gres option. Each possible gres value takes the form of name[[:type]:count]. This means, the only part of the value that is required is the the name, but at least specifying the count may also be desirable so that you are explicit on the number of that resources you require. For example all of the following would be valid gres options

gpu
gpu:1
gpu:NV-K40:1

gres also has a special value of help which when used will print out a list of all valid gres values.

As gres is explicitly used for selecting GPUs on Mana, this option is only valid when using either the gpu, kill-shared and kill-exclusive partitions, or private partitions for lab that have purchased GPU nodes.

Below we will cover how one would use this option in an interactive job and batch job.

Available Gres Values

Examples

Interactive job

Interactive job asking for any type of GPU

[user@login001 ~]$ srun -p gpu --gres=gpu:1 --mem=64G -c 10 -t 120 --pty /bin/bash

The above example would request a single gpu, of any type from the gpu partition. This means you could end of up anything from an Nvidia Tesla k40, Quadro RTX 5000 or a Tesla v100.

Interactive job asking for a Nvidia Tesla v100

[user@login001 ~]$ srun -p gpu --gres=gpu:NV-V100-SXM2:1 --mem=64G -c 10 -t 120 --pty /bin/bash

The above example would request a single gpu, that is a Nvidia Tesla v100

Batch script

Interactive job asking for any type of GPU

#!/bin/bash
#SBATCH --job-name=GPU-example
#SBATCH --partition=gpu
#SBATCH --time=3-00:00:00
#SBATCH --cpus-per-task=10
#SBATCH --mem=62000 
#SBATCH --gres=gpu:1 
#SBATCH --error=gpu-test-%A.err ## %A - filled with jobid
#SBATCH --output=gpu-test-%A.out ## %A - filled with jobid
## Useful for remote notification
##SBATCH --mail-type=BEGIN,END,FAIL,REQUEUE,TIME_LIMIT_80
##SBATCH --mail-user=user@test.org

The above example would request a single gpu, of any type from the gpu partition. This means you could end of up anything from an Nvidia Tesla k40, Quadro RTX 5000 or a Tesla v100.

Interactive job asking for a Nvidia Tesla v100

#!/bin/bash
#SBATCH --job-name=GPU-example
#SBATCH --partition=gpu
#SBATCH --time=3-00:00:00
#SBATCH --cpus-per-task=10
#SBATCH --mem=62000
#SBATCH --gres=gpu:NV-V100-SXM2:1
#SBATCH --error=gpu-test-%A.err ## %A - filled with jobid
#SBATCH --output=gpu-test-%A.out ## %A - filled with jobid
## Useful for remote notification
##SBATCH --mail-type=BEGIN,END,FAIL,REQUEUE,TIME_LIMIT_80
##SBATCH --mail-user=user@test.org

The above example would request a single gpu, that is a Nvidia Tesla v100

Selecting from a subset of GPU types instead of any type

While the examples above show how you would use gres to select GPUs, it does not cover the case where you want to select only certain types of gpus, such as I only want to select Quadro RTX and Tesla v100 GPUs. To accomplish this, you will need to utilize both the gres option and the constraints option.

Koa - Requesting GPUs

Available Gres Values

Examples

Interactive job

Interactive job asking for any type of GPU

Interactive job asking for a Nvidia Tesla v100

Batch script

Interactive job asking for any type of GPU

Interactive job asking for a Nvidia Tesla v100

Selecting from a subset of GPU types instead of any type