How to monitor a pipeline job on AWS

This guide describes how to monitor a pipeline job running on the AWS DP HPC cluster using the SLURM CLI.

Prerequisites

  • An account on the AWS DP HPC cluster

  • A SLURM job currently running on the cluster

Steps

Use standard SLURM tools to monitor the submitted job while it is running.

  1. Identify the job ID from squeue or the output of sbatch:

    squeue -u "$USER"
    

    To continuously monitor queue status of your own jobs, use watch:

    watch -n 1 squeue -u "$USER"
    
  2. Inspect detailed job state and allocated resources:

    scontrol show job <job_id>
    
  3. Check accounting information (state, start/end time, exit code):

    sacct -j <job_id> --format=JobID,JobName,Partition,AllocCPUS,State,ExitCode,Elapsed,Start,End
    
  4. Follow the SLURM output log in real time (you might need to wait until the status changes from configuring (CF) to running (R) before the log file is created):

    tail -f slurm-<job_id>.out
    

    The output log filename is configurable via #SBATCH --output (or sbatch --output). Check the relevant pipeline’s SLURM script for the exact pattern, for example #SBATCH --output=slurm-%x-%j.out.

  5. If you need to cancel the job, use the scancel command with the job ID:

    scancel <job_id>
    

    Or cancel all of your own jobs:

    scancel -u "$USER"
    

Finishing up

Once the SLURM job has finished, the data products will be available in the output directory. Check your pipeline’s documentation for the expected output location.