1. What is SLURM?
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained.
If you need more information, Please Visit https://slurm.schedmd.com/overview.html
2. Basic SLURM Command
-
sbatch
- Submit a batch script to SLURM
-
squeue
- View the queue
-
scancel
- Cancel a SLURM job
1$ scancel [YOUR_JOBID] -
sinfo
- See the state of system
-
smap
- graphically view information about SLURM job
For more information about SLURM command, please visit website below.
3. How to make SLURM batch script?
‘Job submission file’ is the official SLURM name for the file you use to submit your program and ask for resources from the job scheduler. In this document, we will be using it ‘batch script’ or ‘script’.
Basic example
Asking 1 tasks, running for no more than 1 minutes limit memory less than 1gb. If any problem with your job, log file(in this case, ‘basic.log’) have information to help troubleshoot the issue.
You can also use gpu-nodes by using ‘—gres’ option. Here is an example.
|
|
|
|
The job can then be submitted through sbatch
Beacuse we only have 2 gpu machine, this option can’t be set more than 2
For the convenience of users, we provide SLURM job configurator page in our website. Please visit SLURM job configurator and make your own SLURM batch script!