Example self-calibration on MeerKAT data

Setup and Parameters

Dataset

We ran version 0.8.0 of the self-calibration pipeline as a SLURM job on the CSD3 cluster.

The input data were a publicly available dataset observed with MeerKAT as part of the MIGHTEE survey, targeting an off-Galactic plane field. Details are available in: MIGHTEE: total intensity radio continuum imaging and the COSMOS / XMM-LSS Early Science fields

The field is called E-CDFS 2.5, see Figure C1 in the paper above. We used a smaller-sized version of the dataset, where we extracted 512 channels covering the band 950 - 1378 MHz, and the first 4 hours of data. The total uncompressed size of the dataset on disk is 91 GB.

Configuration

We chose a pixel scale of 1.1 arcseconds and a total image size of 10,240 pixels, following the methods described in the paper above. This amounts to a field of view of 3.13 degrees. We did not provide any external sky model to bootstrap calibration; we instead let the pipeline create one by letting it run an initial imaging stage.

We performed two direction-dependent self-calibration cycles, one with just scalar phase calibration and another with scalar phase + amplitude calibration.

For the purposes of direction-dependent calibration, the field was split into 8 calibration patches / facets using the kmeans strategy, i.e. the facets were defined as the Voronoi cells around the 8 K-Means centroids of the flux-weighted source locations.

YAML Configuration File
###############################################################################
# MeerKAT CDFS2.5 field configuration file
###############################################################################

# Custom anchor definitions to avoid repetition below
# NOTE: reducing number of facets for faster run time
.tesselation: &tesselation_defaults
  method: kmeans
  num_patches: 8

.ddecal: &ddecal_defaults
  solve.mode: scalarphase
  # oxkat's default is 5 minutes, but we use ~1 minute instead otherwise the
  # DDECal jobs run out of memory (sampling interval is 8 seconds).
  # In terms of image quality, this does not detectably change anything
  solve.solint: 8
  # Matches oxkat's default
  solve.nchan: 128
  solve.propagatesolutions: true
  solve.propagateconvergedonly: true
  solve.solveralgorithm: hybrid

.wsclean: &wsclean_defaults
  apply-facet-beam: true
  multiscale: true
  niter: 10_000_000
  mgain: 0.8
  auto-mask: 5.0
  auto-threshold: 1.0
  weight: ["briggs", +0.50]
  multiscale-max-scales: 4
  parallel-deconvolution: 2048
  # Must limit the number of major cycles, otherwise we get stuck in a
  # bizarre infinite loop
  # https://skao.slack.com/archives/C017Z6E99S4/p1707930259844579
  nmiter: 8
  mem: 80
  parallel-gridding: 8

###############################################################################
# CONFIG STARTS HERE
###############################################################################

# Maximum fractional bandwidth that can be imaged as a single sub-band.
max_fractional_bandwidth: 0.05

# Custom parameters for the initial imaging stage used to infer an initial sky
# model for self-calibration.
# Only runs if no sky model is provided to the pipeline.
initial_imaging:
  # Override default WSClean parameters
  wsclean: {}

# Definition of self-calibration cycles
selfcal_cycles:
  # Cycle 1
  - tesselation: *tesselation_defaults
    ddecal: *ddecal_defaults
    wsclean:
      <<: *wsclean_defaults
      auto-mask: 30.0

  # Cycle 2
  - tesselation: *tesselation_defaults
    ddecal:
      <<: *ddecal_defaults
      solve.mode: scalar
    wsclean:
      <<: *wsclean_defaults
      auto-mask: 5.0

SLURM Script

This is the SLURM script used to submit the job on CSD3. The pipeline command line is at the bottom.

SLURM batch script
#!/bin/bash
#SBATCH --job-name=MSCP_MKT
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=76
#SBATCH --time=14:00:00
#SBATCH --partition=icelake
#SBATCH --account=SKA-SDHP-SL2-CPU
#SBATCH --signal=B:TERM@180

##### User defined variables
BASE_JOB_DIR=/rds/project/rds-S7lLL7eOZIg/hpcmore1/PI22/meerkat_cdfs25_docs_example

##### Job variables that MUST be defined
INPUT_MS=$BASE_JOB_DIR/MKT_CDFS25_4hours_950_1378MHz.ms
SINGULARITY_IMAGE=/rds/project/rds-S7lLL7eOZIg/singularity-images/dp3-wsclean-mpi-icelake.sif
BASE_OUTDIR=$BASE_JOB_DIR/output
CONFIG=$BASE_JOB_DIR/config.yml
NPIX=10240
PIXEL_SCALE_ASEC=1.1
DASK_WORKERS_PER_NODE=38
DASK_PORT=8786

# -----------------------------------------------------------------------------
# Anything below should not be edited
# -----------------------------------------------------------------------------

##### Setup modules
# Source bashrc to make module command available
source ~/.bashrc

# Load required base modules for icelake, and then OpenMPI 4.x
module purge
module load rhel8/default-icl
module load openmpi-4.0.5-gcc-8.4.1-l7ihwk3

set -x

##### Create output directories
PIPELINE_DIR=$BASE_OUTDIR/pipeline
DASK_LOGS_DIR=$BASE_OUTDIR/dask_logs
SYSMON_DIR=$BASE_OUTDIR/sysmon

mkdir -p $BASE_OUTDIR
mkdir -p $PIPELINE_DIR
mkdir -p $DASK_LOGS_DIR
mkdir -p $SYSMON_DIR

##### Fetch list of nodes
NODES=($(scontrol show hostnames))
HEAD_NODE="$(hostname)"

# Join array into space-separated string
NODES_SPACE_SEPARATED="${NODES[*]}"

echo "Allocated nodes: $NODES_SPACE_SEPARATED"
echo "Head node: $HEAD_NODE"

##### Start system monitor on ALL nodes
for node in "${NODES[@]}"; do
    outfile=$SYSMON_DIR/"system_usage_$node.jsonl"
    ssh $node mid-selfcal-system-monitor >$outfile &
    echo "Started system monitor on $node"
done

##### Start dask scheduler on head node
DASK_SCHEDULER_ADDR=$HEAD_NODE:$DASK_PORT

dask scheduler --port ${DASK_PORT} >$DASK_LOGS_DIR/scheduler_$HEAD_NODE.log 2>&1 &
echo "Started dask scheduler on $DASK_SCHEDULER_ADDR"

##### Start dask workers on all nodes via ssh (even on head node, it's fine)
for node in "${NODES[@]}"; do
    logfile=$DASK_LOGS_DIR/worker_$node.log
    # NOTE: the Mid pipeline expects the custom resource "subprocess_slots"
    # and it must be equal to 1.
    # Implicit assumption: Python environment for pipeline is loaded upon login
    ssh $node dask worker $DASK_SCHEDULER_ADDR --name $node \
        --nworkers $DASK_WORKERS_PER_NODE --resources subprocess_slots=1 \
        >$logfile 2>&1 &
    echo "Started dask worker on $node"
done

##### Start pipeline
# "exec" so that SIGTERM propagates to the pipeline executable
exec mid-selfcal-dd --dask-scheduler $DASK_SCHEDULER_ADDR \
    --mpi-hosts $NODES_SPACE_SEPARATED \
    --singularity-image $SINGULARITY_IMAGE \
    --config $CONFIG \
    --num-pixels $NPIX --pixel-scale $PIXEL_SCALE_ASEC \
    --outdir $PIPELINE_DIR \
    $INPUT_MS

Clean Image

Plotting with DS9

At the start of each self-calibration cycle, the pipeline saves both the skymodel and the calibration patch (facet) parameters to DS9 region files, so that they can be overlaid when loading the output images in DS9.

In the examples below we used the command:

ds9 -region final_patches.reg <FITS_FILE>

Result

Below is the clean image returned, plotted with a linear color scale strongly clipped around zero, as the image has a large dynamic range. The calibration patches are overlaid:

_images/dd_final_clean.png

Run times

Running on a single CSD3 icelake node and using 76 logical cores, the overall run time was approximately 12 hours. Calibration was split into 249 independent time intervals and parallelized using 38 dask workers.

Here’s a very abridged version of the pipeline logs providing a breakdown of the run times for each program:

[INFO - 2024-05-11 10:03:46,029] Running version: 0.8.0+16.g4f8d32f
[INFO - 2024-05-11 10:03:47,486] Input MeasurementSet: MeasurementSet(path=/rds/project/rds-S7lLL7eOZIg/hpcmore1/MeerKAT/MKT_CDFS25_4h
ours_950_1378MHz.ms, fbot=950.36 MHz, ftop=1378.36 MHz, channels=512, phase_centre=03h28m12.63s -27d35m15.00s)


[INFO - 2024-05-11 10:03:47,488] Running initial imaging stage
[INFO - 2024-05-11 10:03:47,489] Running wsclean
...
[DEBUG - 2024-05-11 13:16:43,884] Inversion: 01:02:13.553159, prediction: 00:54:55.023797, deconvolution: 01:12:15.796589
[INFO - 2024-05-11 13:17:35,872] wsclean finished in 11628.38 seconds
[INFO - 2024-05-11 13:17:40,930] Initial image peak S/N = 8409.6
[INFO - 2024-05-11 13:17:40,931] Initial image S/N threshold = 290.0
[INFO - 2024-05-11 13:17:40,980] Initial image components detected: 12181
[INFO - 2024-05-11 13:17:41,710] Initial image components kept: 601
[INFO - 2024-05-11 13:17:41,714] Input SkyModel: 601 sources


[INFO - 2024-05-11 13:17:41,722] Starting self-calibration cycle 1/2
[INFO - 2024-05-11 13:17:42,122] Running DDECal stage
[INFO - 2024-05-11 13:17:42,559] Submitted 249 DDECal tasks
...
[INFO - 2024-05-11 14:08:13,279] DDECal stage complete
[INFO - 2024-05-11 14:08:13,290] Running wsclean
...
[DEBUG - 2024-05-11 17:52:04,587] Inversion: 01:06:44.562637, prediction: 01:38:47.977776, deconvolution: 00:36:08.727644
[INFO - 2024-05-11 17:54:01,339] wsclean finished in 13548.05 seconds
[INFO - 2024-05-11 17:54:05,180] RMS residual: 5.94 uJy
[INFO - 2024-05-11 17:54:06,391] Robust RMS residual: 4.00 uJy


[INFO - 2024-05-11 17:54:06,391] Starting self-calibration cycle 2/2
[INFO - 2024-05-11 17:54:07,743] Running DDECal stage
[INFO - 2024-05-11 17:54:08,339] Submitted 249 DDECal tasks
...
[INFO - 2024-05-11 18:55:32,695] DDECal stage complete
[INFO - 2024-05-11 18:55:32,709] Running wsclean
...
[DEBUG - 2024-05-11 22:45:57,177] Inversion: 01:11:28.930248, prediction: 01:38:48.716264, deconvolution: 00:38:06.815976
[INFO - 2024-05-11 22:47:51,678] wsclean finished in 13938.97 seconds
[INFO - 2024-05-11 22:48:03,108] RMS residual: 4.60 uJy
[INFO - 2024-05-11 22:48:04,606] Robust RMS residual: 3.85 uJy
[INFO - 2024-05-11 22:48:04,727] Pipeline run: SUCCESS