Slurm is a system for managing and scheduling Linux clusters. It is open source, fault tolerant and scalable, suitable for clusters of various sizes.
When Slurm is implemented, it can perform these tasks:
Include this directive in your shell scripts to tell SLURM how many nodes to allocate:
#SBATCH --cpus-per-task=128
FIXME: LIST THE MAIN USEFUL DIRECTIVES HERE WITH EXAMPLES
##---------------------------------------------------------- #SBATCH -J Job1 # Job Name #SBATCH -N 1 # Total number of nodes requested #SBATCH --exclusive # Do not permit jobs to share nodes #SBATCH --error=error_%j.err # Error File #SBATCH --output=out_%j.out # Standard Output File ##---------------------------------------------------------- ##insert required modules here ##---------------------------------------------------------- # Directory to store output cd /mnt/hyperion/insert path to desired directory # Executable srun /insert path to executable here
For multiple jobs on one node each:
#!/bin/bash ##---------------------------------------------------------- ## Testing Event 33 with 1 Node Each ## Kyra M. Bryant ## March 27th, 2022 ##---------------------------------------------------------- #SBATCH -J Historical_Event # Job name #SBATCH -N 1 # Total number of nodes requested #SBATCH --exclusive # Do not permit jobs to share nodes #SBATCH --error=error_%a.err # error file #SBATCH --output=out_%a.out # output file #SBATCH --array=1-4 # 4 Jobs total ##---------------------------------------------------------- parfile_array[1]=parfile_1.par # Job 1 parfile_array[2]=parfile_2.par # Job 2 parfile_array[3]=parfile_3.par # Job 2 parfile_array[4]=parfile_4.par # Job 3 cd /mnt/hyperion/data/fathom/teaching/historical/event_33 srun /executable path ${parfile_array[$SLURM_ARRAY_TASK_ID]}
FIXME: LIST THE MAIN USEFUL COMMANDS HERE WITH EXAMPLES