Intro
This documentation will go over the basics of using the Chili Pepper cluster. Please go through this documentation step-by-step. Contact the server administrator via email or use the Q&A channel in the Slack Group.
Step 1 - Understanding the Cluster Structure
Chili Pepper cluster has a NAS which contains user home directories and other shared files. All users and groups are consistent across all nodes. The prefix for the user directory is /mnt/nas/users/. For example, the home directory for the user dummyuser will be /mnt/nas/users/dummyuser/. The home directory is the recommended directory for users to store scripts, data and configuration files for using the Chili Pepper cluster.
|
|
Step 2 - Data Transfer
With cluster access information(SSH) provided by the administrator, you can send and recieve files from and to the cluster with scp. The dummyuser can send various local files to the cluster in the following fashion. Note that for security purposes the default port for SSH is not 22. The administrator will inform you of this information upon providing access information to the cluster.
Sending a local file(some_file.txt) to the remote home directory
|
|
Sending a local directory(some_files/) to the remote home directory
Recieving a remote file(some_file.txt) in the home directory to the current local directory
|
|
Recieving a remote directory(/mnt/nas/users/dummyuser/some_files) in the home directory to the current local directory
|
|
Other various options for scp exist. More information on this topic can be found in this article. Also users can use GitHub or GitLab to upload and download source code to the cluster. This will be handled in a seperate article.
Step 3 - Writing a SBATCH script for SLURM.
From the homepage the SLURM batch scripting tool is available. Let’s look at the sample script(/mnt/nas/users/dummyuser/test_script.sh) created by using the tool. The first-half(line 1 ~ 11) of the script consists of directives and parameters for the slurm job. Each user can set the number of nodes, the time for the job to occupy the number of nodes, the location for the output logfile. There are more options available for submitting a job. Additional resources for SBATCH arguments can be found here.
|
|
From line 15 to the end of the script are actual bash commands for the node to execute.
- Line 15~16 creates two local variables(
ENV_NAMEandENV_PATH). In the above script a conda environment namedmyenvwill be created under/mnt/nas/users/dummyuser/.conda/envs/myenv. - Line 18 will remove the environment in
ENV_PATHif it is present. - Line 19 will create a conda environment in
ENV_PATHthanks to the-y(--yes)flag. This environment will have a Python interpreter of version3.8along with listed packages(pandas, numpy and scikit-learn). - Line 20 will activate the conda environment in
ENV_PATHand then sequentially run apip freezeto thestdout. Note that thestdoutis saved in the log file from line 11(/mnt/nas/users/dummyuser/conda.log).
To actually run a data science job, the only thing you have to do is to change the required packages for your environment, modify the pip freeze into python your_script_to_run.py.
Step 4 - Submitting the script
The submission of the script from the above is very simple.
|
|
You can check the current job queue with the following command.
|
|
When you want to cancel the job you have submitted, get the JOBID from the squeue command and use the scancel command in the following fashion. Suppose the JOBID is 23.
|
|
Note that ordinary users cannot cancel jobs that belong to other users, but the administrator can.