How to run on the AWS DP HPC cluster using SLURM
This page describes how to run the Continuum Imaging Pipeline (CIMG) on one or more nodes on the AWS DP HPC cluster using SLURM.
If you want to run inside a container on your local machine instead, see the quickstart guide.
If you want to monitor the job using the Prefect UI, see the Prefect instructions.
The scripts have been tested on the AWS DP HPC cluster.
Prerequisites
An account on the AWS DP HPC cluster
This repository cloned to a directory on the AWS DP HPC cluster
Steps
1. Submit a SLURM job to run the pipeline
Log into the DP HPC headnode.
Change directory to the repository root folder, OR set the
REPO_DIRenvironment variable:export REPO_DIR=~/path/to/repo/ska-sdp-cimg
Edit the SLURM script scripts/prod/aws-run-cimg.sbatch if needed (paths, job parameters, number of nodes etc).
If you require a specific version of the ska-sdp-spack environment:
export SPACK_TAG="2026.03.2"
Submit the SLURM job:
The script sets up the compute environment using spack and runs the pipeline on a single compute node by default.
sbatch scripts/prod/aws-run-cimg.sbatchOutput:
Submitted batch job <job_id>To run on multiple nodes, override the SLURM directives when submitting the job. It is important to set the number of tasks equal to the total number of nodes.
sbatch --nodes=3 --ntasks=3 --cpus-per-task=96 scripts/prod/aws-run-cimg.sbatch
Check job status:
squeue sacct
2. Finishing up
Once the SLURM job has finished, the data products will be available in the latest time-stamped
output directory under $PWD/runs for inspection.
Logs will be output to filepaths specified in the slurm script. These include:
slurm-<job_name>-<job_id>.out: standard output and error from the job (including output from WSClean which is run by theimagetask intasks.image.py)versions-<job_name>-<job_id>.txt: a list of versions of key software used in the job, including spack modules and python packagesruns/<job_name>-<job_id>-<timestamp>/monitor-<job_name>-<job_id>/: directory containing per-node benchmarking traces and logs when running on multiple nodes.