Running on SLURM

We maintain a tested and approved SLURM batch (sbatch) script for running the pipeline on both the CSD3 cluster and the DAS6 cluster. These scripts can be found in the examples directory. The scripts use Dask distribution for the calibration stage (DP3), and MPI distribution for the imaging stage (WSClean). A copy of the sbatch script for CSD3 is included below.

Feel free to adapt it to your needs.

Information on general structure

In order to isolate the code that is common to both clusters, both sbatch scripts call several other scripts. This reduces code duplication between the two scripts and has as additional advantage that the two individual sbatch scripts focus more on settings that differ between clusters. In general, the scripts have been made as self-documenting as possible and consist of the following stages.

First, the scripts provide the recommended set of SLURM parameters, followed by directory paths, and required arguments and other settings for the pipeline. Next, they start the poetry environment which contains all pipeline dependencies. Doing this in the sbatch script avoids having to add the necessary commands for starting the poetry environment into your ~/.bashrc file. In the latter case, it would not be possible to consecutively run repos that require different dependencies.

Once all settings are defined and the poetry environment is loaded, the Dask scheduler and workers are started. The scheduler will run on the head node, whereas each node will run a number of Dask workers. That is, on both the head node and the worker nodes. The total number of nodes here depends on the SLURM environment requested. Starting this up is done in a separate script, start_dask_scheduler_workers. Be aware that this is a recursive script. It runs to the full extend only on the head node, whereafter it calls itself recursively on all worker nodes. That way, we can use a single ssh command to load the poetry environment prior to starting the Dask workers on each node. We start the scheduler and workers in this manner so that the workload is evenly distributed over all the requested nodes and to have more control over this process. Once all Dask workers are requested, a separate script is called to connect to the scheduler and wait for all workers to become available to the scheduler. We noticed that in some situations where commencing the pipeline is sufficiently fast and DP3 is called when not all workers are available yet, the work does not distribute properly over nodes.

The start_dask_scheduler_workers script also supports starting “dool”, which tracks resource usage while the pipeline is running. The resource usage tracking is disabled by default. The script only starts “dool” if is available in your $PATH or in the poetry environment for the pipeline. If enabled, dool generates one csv file for each node, which will contain about one megabyte per hour. The –dool-delay option of start_dask_scheduler_workers allows using a custom delay value, and thereby reduce the csv file size. An internal SKAO confluence page <https://confluence.skatelescope.org/display/SE/System%27s+resource+tracing+with+dool> contains a tool for converting dool reports in csv format into png images with graphs that display resource usage over time.

Also note that anything below an exec is not ran as per the behaviour of exec.

#!/bin/bash
# Example SLURM batch script for running the pipeline at CSD3.
# Adjust requested resources and job variables as necessary.
#
# SLURM settings:
#SBATCH --job-name=LowSelfCal # Default is name of script.
#SBATCH --partition=icelake   # Use 'icelake' nodes.
#SBATCH --nodes=3             # Use any three nodes.
#SBATCH --ntasks-per-node=1
#SBATCH --exclusive           # Gain exclusive access to the nodes / avoid sharing.
#SBATCH -t 24:00:00           # Set job timeout.

##### Pipeline directory settings.
# Installation directories of helper libraries and tools.
SPACK_ROOT=$HOME/rds/hpc-work/spack/opt/spack/linux-rocky8-icelake/gcc-13.2.0
OPENMPI_ROOT=$SPACK_ROOT/openmpi-5.0.2-hj6jvdb4s6irf6rrnvfzgeahfofhszaz
EVERYBEAM_ROOT=$SPACK_ROOT/everybeam-0.5.5-hlm4zwzjwxa3wnwiqvg265hvef2cxi3v

# Location of the pipeline. It should have a poetry virtual environment.
# https://developer.skao.int/projects/ska-sdp-wflow-selfcal/en/latest/README.html#installation
# describes how to create a poetry virtual environment for the pipeline.
PIPELINE_HOME=$HOME/schaap/ska-sdp-wflow-selfcal

# Working directory. We recommend using your 'hpc-work' directory for CSD3.
WORK_DIR=$HOME/rds/hpc-work/ast-1489/avg

# Command to execute DP3 and WSClean. (WSClean is started using mpirun.)
SCRATCH_DIR=$HOME/scratch
DP3_PATH=$SCRATCH_DIR/schaap/dp3/build/DP3
MPIRUN="$OPENMPI_ROOT/bin/mpirun --bind-to none -x OPENBLAS_NUM_THREADS -npernode 1"
WSCLEAN_CMD="$MPIRUN $SCRATCH_DIR/schaap/wsclean/build/wsclean-mp"

##### Pipeline arguments.
# Input measurement set.
INPUT_MS=$HOME/rds/skao/hpcsalv1/data/midbands_averaged.ms

# Extra arguments for the pipeline.
PIPELINE_ARGS+=" --ignore_version_errors True"
PIPELINE_ARGS+=" --resume_from_operation image_1 --run_single_operation True"
# Copy one of the example yml files in the config/ directory, adjust
# settings as needed (like "numthreads" for DP3), and update the path below.
PIPELINE_ARGS+=" --config /path/to/config_file.yml"

# Distribution settings.
DASK_WORKERS_PER_NODE=3
DASK_PORT=8786

# Logging tag is incorporated in the Dask logs too
LOGGING_TAG="${SLURM_JOB_ID:-$$}"

# -----------------------------------------------------------------------------
# Anything below should not be edited
# -----------------------------------------------------------------------------

# Set up CSD3 specific environment.
module purge
module load rhel8/slurm
module load openmpi

export EVERYBEAM_DATADIR=$EVERYBEAM_ROOT/share/everybeam

# Make sure the local bin directory is in your PATH by adding the following
# line to your .bashrc file ($HOME/.bashrc), otherwise the system cannot find
# the poetry command:
# export PATH="$HOME/.local/bin:$PATH"

# Load poetry environment
ENV_DIR=$(cd $PIPELINE_HOME; poetry env info --path)
source $ENV_DIR/bin/activate

##### Start Dask scheduler on head node and workers on all nodes.
# Adding a "--dool" option enables tracking resource usage using dool.
$PIPELINE_HOME/examples/start_dask_scheduler_workers.sh \
  --workers-per-node $DASK_WORKERS_PER_NODE \
  --logging-tag $LOGGING_TAG \
  --logs-dir $WORK_DIR/logs \
  --port $DASK_PORT

##### Start pipeline.
# "exec" so that SIGTERM propagates to the pipeline executable
exec ska-sdp-wflow-selfcal \
  --dp3_path $DP3_PATH \
  --wsclean_cmd "$WSCLEAN_CMD" \
  --input_ms "$INPUT_MS" --work_dir "$WORK_DIR" \
  --logging_tag $LOGGING_TAG \
  --dask_scheduler `hostname`:$DASK_PORT \
  $PIPELINE_ARGS