How to monitor a pipeline job on AWS

This guide describes how to monitor a pipeline job running on the AWS DP HPC cluster using the SLURM CLI.

Prerequisites

An account on the AWS DP HPC cluster
A SLURM job currently running on the cluster

Steps

Use standard SLURM tools to monitor the submitted job while it is running.

Identify the job ID from squeue or the output of sbatch:
squeue -u "$USER"
To continuously monitor queue status of your own jobs, use watch:
watch -n 1 squeue -u "$USER"
Inspect detailed job state and allocated resources:
scontrol show job <job_id>

Check accounting information (state, start/end time, exit code):

sacct -j <job_id> --format=JobID,JobName,Partition,AllocCPUS,State,ExitCode,Elapsed,Start,End

Follow the SLURM output log in real time (you might need to wait until the status changes from configuring (CF) to running (R) before the log file is created):
tail -f slurm-<job_id>.out
The output log filename is configurable via #SBATCH --output (or sbatch --output). Check the relevant pipeline’s SLURM script for the exact pattern, for example #SBATCH --output=slurm-%x-%j.out.
If you need to cancel the job, use the scancel command with the job ID:
scancel <job_id>
Or cancel all of your own jobs:
scancel -u "$USER"

Finishing up

Once the SLURM job has finished, the data products will be available in the output directory. Check your pipeline’s documentation for the expected output location.

How to monitor a pipeline job on AWS

Related

Prerequisites

Steps

Finishing up