How to monitor a pipeline job on AWS
This guide describes how to monitor a pipeline job running on the AWS DP HPC cluster using the SLURM CLI.
Prerequisites
An account on the AWS DP HPC cluster
A SLURM job currently running on the cluster
Steps
Use standard SLURM tools to monitor the submitted job while it is running.
Identify the job ID from
squeueor the output ofsbatch:squeue -u "$USER"
To continuously monitor queue status of your own jobs, use
watch:watch -n 1 squeue -u "$USER"
Inspect detailed job state and allocated resources:
scontrol show job <job_id>
Check accounting information (state, start/end time, exit code):
sacct -j <job_id> --format=JobID,JobName,Partition,AllocCPUS,State,ExitCode,Elapsed,Start,End
Follow the SLURM output log in real time (you might need to wait until the status changes from configuring (CF) to running (R) before the log file is created):
tail -f slurm-<job_id>.out
The output log filename is configurable via
#SBATCH --output(orsbatch --output). Check the relevant pipeline’s SLURM script for the exact pattern, for example#SBATCH --output=slurm-%x-%j.out.If you need to cancel the job, use the
scancelcommand with the job ID:scancel <job_id>Or cancel all of your own jobs:
scancel -u "$USER"
Finishing up
Once the SLURM job has finished, the data products will be available in the output directory. Check your pipeline’s documentation for the expected output location.