Job Submission/Management

SLURM Workload Manager

Slurm is the software that manages jobs that are submitted to the compute nodes. Information about how to submit and manage jobs is listed below.

BisonNet Partitions (Queues)

There are four job submission partitions available to all users, based on the expected run time of a job. Each partition has a per-user limit on the number of cores which can be used at one time.

short (the default partition) - Max time = 1 day, Max # of CPU cores = 40 
medium - Max time = 7 days, Max # of CPU cores = 20 
long - Max time = 30 days, Max # of CPU cores = 15 
lowpriority - Max time = 30 days, Max # of CPU cores = 40

There is a separate parition for Graphics Processing Unit (GPU) access.

gpu - Max time = 7 days, Max # of GPUs = 4, Max # of CPU cores = 4

Additionally, a few people/departments have purchased compute nodes for their own use so there are some additional partitions/queues that are limited to those specific groups.

Job Scheduling

When all cluster cores are occupied, pending jobs are prioritized using a fairshare algorithm which incorporates job age, job size, and user resource consumption history. Pending jobs may preempt (suspend) a running job based on partition priority. Short jobs may preempt medium and long jobs, while medium jobs may preempt long jobs. All other jobs may preempt lowpriority jobs.

SLURM Commands

Submitting jobs

The following example script demonstrates how to specify some important parameters for a job. It is necessary to specify number of cores (if running multithreaded jobs), the number of tasks (if running MPI capable jobs), memory (if job is expected to use more than 8GB), and GPUs (if using Graphics Processing Units), or your job may fail or not run as expected. Please note, however, that you are not required to use all of these options and should only use them if necessary.

This script performs a simple task — it generates of file of random numbers and then sorts it.

#!/bin/bash 
#SBATCH --partition short # partition (queue) 
#SBATCH --nodes 1 # (leave at 1 unless using multi-node specific code) 
#SBATCH --ntasks 1 # number of tasks (change if using MPI capable software)
#SBATCH --cpus-per-task 1 # CPU cores per task (change if using multithreaded software)
#SBATCH --mem=8192 # total memory in MB
#SBATCH --job-name="myjob" # job name 
#SBATCH -o slurm.%N.%j.stdout.txt # STDOUT 
#SBATCH -e slurm.%N.%j.stderr.txt # STDERR 
#SBATCH --mail-user=username@bucknell.edu # address to email 
#SBATCH --mail-type=ALL # mail events (NONE, BEGIN, END, FAIL, ALL) 
for i in {1..100000}; do 
  echo $RANDOM >> SomeRandomNumbers.txt 
done 
sort -n SomeRandomNumbers.txt

If you require one or more GPUs, add a line similar to the following and change the partition:

#SBATCH --partition gpu # gpu partition
#SBATCH --gres=gpu:1 # number of GPUs

Now you can submit your job with the command:

sbatch myscript.sh

If you want to test your job and find out when your job is estimated to run use (note this does not actually submit the job):

sbatch --test-only myscript.sh

Interactive Jobs

You may run an interactive session on a compute node with a command such as:

srun --nodes 1 --cpus-per-task 2 --partition short --pty /bin/bash

or, for short:

srun -N 1 -c 2 -p short --pty /bin/bash

The --pty flag indicates an interactive terminal, and the /bin/bash denotes the shell to be run. Other options (such as --ntasks) may be specified as in a submission script. Note that you should change –cpus-per-task for multithreaded software and instead use –ntasks for MPI capable software (if you have questions about your software/application, contact bisonnet@bucknell.edu). Please note that interactive jobs are subject to the same partition-based time/core limits as batch jobs.

Information on jobs

List all current jobs for a user:

squeue -u <username>

List all running jobs for a user:

squeue -u <username> -t RUNNING

List all pending jobs for a user:

squeue -u <username> -t PENDING

List all current jobs in the short partition for a user:

squeue -u <username> -p short

List detailed information for a job (useful for troubleshooting):

scontrol show jobid -dd <jobid>

Once your job has completed, you can get additional information that was not available during the run. This includes run time, memory used, etc.

To get statistics on completed jobs by jobID:

sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed

To view the same information for all jobs of a user:

sacct -u <username> --format=JobID,JobName,MaxRSS,Elapsed

Controlling jobs

To cancel one job:

scancel <jobid>

To cancel all the jobs for a user:

scancel -u <username>

To cancel all the pending jobs for a user:

scancel -t PENDING -u <username>

To cancel one or more jobs by name:

scancel --name myJobName

To pause a particular job:

scontrol hold <jobid>

To resume a particular job:

scontrol resume <jobid>

To requeue (cancel and rerun) a particular job:

scontrol requeue <jobid>

Checking Job Efficiency

Requesting more resources (e.g. CPU cores or memory) than required for your job prevents other people from using those resources. Some software, for example, is not capable of taking advantage of multiple cores so you should only use one core. You can check the efficiency of a job after it completes using the seff command. When running this command, take note of the CPU Efficiency and Memory Efficiency fields. If the percentage used is very low, consider reducing your resources requests (i.e. decrease the CPU cores or memory requests in your batch script).

seff <jobid>

Further Information

More detailed information on Slurm commands/options can be found at: https://slurm.schedmd.com/