How to run on the AWS DP HPC cluster using SLURM and the Prefect UI

This page describes how to run the Continuum Imaging Pipeline (CIMG) on one or more nodes on the AWS DP HPC cluster using SLURM with the Prefect server running on the headnode.

If you want to run the job without the Prefect UI, see the Instructions to run the pipeline using only SLURM.

There are three main steps to running the Continuum Imaging Pipeline (CIMG) on the AWS DP HPC cluster using SLURM with Prefect:

  1. Set up a Prefect server on the headnode

  2. Submit a SLURM job

  3. Monitor the flow in the Prefect UI

Prerequisites

  • An account on the AWS DP HPC cluster

  • This repository cloned to a directory on the AWS DP HPC cluster

Steps

1. Set up a Prefect server on the headnode

If there is already a Prefect server running on the headnode, you can skip this step.

  1. Log into the DP HPC headnode.

  2. Start a tmux session (or attach to an existing one if you already have one set up):

    tmux new -s prefect
    # or
    tmux attach -t prefect
    
  3. Change to the project root directory.

  4. Run the shell script that starts the prefect server:

    ./scripts/dev/prefect/aws-prefect-start.sh
    

    If the port is already in use, you can set the environment variable PREFECT_PORT to an alternative port before running the command above.

    export PREFECT_PORT=12345
    ./scripts/dev/prefect/aws-prefect-start.sh
    

    You should see a Prefect startup message in the terminal as well as the instructions for setting up the SSH tunnel to access the Prefect UI on your local machine. Make note of these instructions to access the Prefect UI.

    Example output:

    To tunnel from your laptop (example):
    aws-vault exec dp-hpc -- \
        ssh -N -4 -L 127.0.0.1:14200:127.0.0.1:46200 <your-username>@<headnode-address>
    
    Then open: http://127.0.0.1:14200 in your browser to view the Prefect dashboard.
    

    The Prefect server will continue running in the tmux session until you stop it, even if you log out of the headnode.

  5. Detach from tmux (CTRL-B D) to leave the server running.

2. Submit a SLURM job to run the pipeline

Make sure the Prefect server is running before submitting the job. Note that this set up does not support running multiple jobs simultaneously.

  1. Log into the headnode.

  2. Set repository path:

    export REPO_DIR=~/path/to/repo/ska-sdp-cimg
    
  3. If using a custom Prefect port, Make sure to use the same port as the one you set when starting the Prefect server.

    export PREFECT_PORT=12345
    
  4. Edit the SLURM script scripts/dev/aws-run-cimg-spack-deployed.sbatch if needed (paths, job parameters, number of nodes etc).

  5. Submit the job:

    sbatch scripts/dev/aws-run-cimg-spack-deployed.sbatch
    

3. Monitor with Prefect UI

If you noted down the SSH tunneling instructions from the output of the Prefect server startup script, run the SSH tunnel command on your local machine. Alternatively, you can find those instructions in the prefect log file in the project root directory. Example filename: prefect-server-20260331-101739.log.

  1. Run the command on your local machine.

    aws-vault exec dp-hpc -- \
        ssh -N -4 -L 127.0.0.1:14200:127.0.0.1:46200 <your-username>@<headnode-address>
    
  2. Open the browser (may be different if you set a custom port):

    http://127.0.0.1:14200

4. Finishing up

  • Outputs are in $PWD/runs

  • Stop Prefect server when done:

    • Reattach tmux

    • Press CTRL-C

    • Exit session

  • Close SSH tunnel (CTRL-D)