Example DD self-calibration on AA2 Mid

Setup and Parameters

Dataset

We ran version 0.5.1 of the DD selfcal pipeline as a SLURM job on the CSD3 cluster, using the AA2 Mid simulated dataset produced by team HiPPo in PI18. We used a small-sized version of the dataset that includes mild corrupting gains, has 64 frequency channels, a 4-hour observing time and the innermost 62 antennas only (this excludes the two “outlier” antennas, and brings the maximum baseline length down to 34.2 km). The total uncompressed size of the dataset on disk is 72 GB.

Input parameters

To bootstrap calibration, we used the AA2 Mid sky model file provided in the data directory of the repository, which contains the parameters the top 100 brightest sources in the field with their apparent fluxes (i.e. primary-beam attenuated), which is what the pipeline expects.

We chose a pixel scale of 0.3 arcseconds, which corresponds to about 6 pixels across the Airy disk FWHM at 950 MHz for a maximum baseline length of 34.2 km. We requested a 24,000 pixel squared image for a total field of view of 2 degrees.

Configuration file

We performed a single self-calibration cycle with just a scalar phase + amplitude calibration. Note that the solution interval in frequency is one channel, as the simulated corrupting gains injected into the data are independent in every channel.

The imaging field was split in 9 calibration patches / facets using the voronoi_brightest strategy, i.e. the facets were defined as the Voronoi cells around the 9 brightest sources.

We used the default weighting mode set by the pipeline, which is Briggs weighting with a robustness parameter of +0.5.

selfcal_cycles:
  # Cycle 1
  - tesselation:
      method: voronoi_brightest
      num_patches: 9

    ddecal:
      solve.mode: scalar
      solve.solint: 30
      solve.nchan: 1
      solve.propagatesolutions: true
      solve.propagateconvergedonly: true
      solve.solveralgorithm: hybrid

    imaging:
      flags: []
      options:
        "-niter": 50_000
        "-mgain": 0.7

SLURM job script

For this example run we used 16 logical CPU cores on a single icelake node of the CSD3 computing facility:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=76
#SBATCH --time=24:00:00
#SBATCH --partition=icelake
#SBATCH --signal=B:TERM@600
#SBATCH --account=SKA-SDHP-SL2-CPU

export BASE_DIR=...

# NOTE: glob patterns must be enclosed in double quotes
export MS=$BASE_DIR/MS_AA2-Mid_rev4_corrupted_62_stations_4.0h_0064ch
export SIF=$BASE_DIR/dp3-wsclean-mpi.sif
export OUTDIR=$BASE_DIR/output
export CONFIG=$BASE_DIR/config_9facets_voronoi_brightest.yml
export SKYMODEL=$BASE_DIR/aa2_mid_pb_attenuated_top100.skymodel

export NPIX=24000
export SCALE=0.3

# Load environment
source ~/.bashrc
conda activate selfcal

# Make outdir if necessary
mkdir -p $OUTDIR

# "exec" so that SIGTERM propagates to the pipeline executable
exec mid-selfcal-dd --base-outdir $OUTDIR --config $CONFIG --singularity-image $SIF --num-pixels $NPIX --pixel-scale $SCALE --sky-model $SKYMODEL $MS

Images produced

Plotting with DS9

At the start of each self-calibration cycle, the pipeline saves both the skymodel and the calibration patch (facet) parameters to DS9 region files, so that they can be overlaid when loading the output images in DS9.

In the examples below we used the command:

ds9 -scale mode zscale -cmap cool -region patches01.reg -region sources01.reg <FITS_FILE>

Result

Below is the clean image returned. The calibration patches and the top 100 brightest sources from the original sky model are overlaid:

_images/dd_final_clean.jpg

Run times

Running on a single CSD3 icelake node and using 76 logical cores, the overall run time was approximately 6 hours.

Here’s an abridged version of the pipeline logs providing a breakdown of the run times for each program:

[INFO - 2023-11-10 08:39:34,287 - mid-selfcal] Running version: 0.5.1
[DEBUG - 2023-11-10 08:39:34,880 - mid-selfcal] Imaging centre: 00h00m00.00s -45d00m00.00s
[DEBUG - 2023-11-10 08:39:35,082 - mid-selfcal] patch_00 encloses   10 sources and a total flux of  534.11 mJy
[DEBUG - 2023-11-10 08:39:35,082 - mid-selfcal] patch_01 encloses    9 sources and a total flux of   60.76 mJy
[DEBUG - 2023-11-10 08:39:35,082 - mid-selfcal] patch_02 encloses    5 sources and a total flux of   49.39 mJy
[DEBUG - 2023-11-10 08:39:35,082 - mid-selfcal] patch_03 encloses    4 sources and a total flux of   44.62 mJy
[DEBUG - 2023-11-10 08:39:35,082 - mid-selfcal] patch_04 encloses    9 sources and a total flux of   54.19 mJy
[DEBUG - 2023-11-10 08:39:35,082 - mid-selfcal] patch_05 encloses   15 sources and a total flux of   80.61 mJy
[DEBUG - 2023-11-10 08:39:35,082 - mid-selfcal] patch_06 encloses    3 sources and a total flux of   23.47 mJy
[DEBUG - 2023-11-10 08:39:35,082 - mid-selfcal] patch_07 encloses   33 sources and a total flux of  118.72 mJy
[DEBUG - 2023-11-10 08:39:35,082 - mid-selfcal] patch_08 encloses   11 sources and a total flux of   40.92 mJy

...

[DEBUG - 2023-11-10 08:39:41,481 - mid-selfcal.DP3] Processing 18948 time slots ...
[DEBUG - 2023-11-10 08:39:41,481 - mid-selfcal.DP3]
[DEBUG - 2023-11-10 10:07:27,270 - mid-selfcal.DP3] 0%....10....20....30....40....50....60....70....80....90....100%
[DEBUG - 2023-11-10 10:07:33,069 - mid-selfcal.DP3] Finishing processing ...
[DEBUG - 2023-11-10 10:07:33,069 - mid-selfcal.DP3]
[DEBUG - 2023-11-10 10:07:33,069 - mid-selfcal.DP3] NaN/infinite data flagged in reader
[DEBUG - 2023-11-10 10:07:33,069 - mid-selfcal.DP3] ===================================
[DEBUG - 2023-11-10 10:07:33,069 - mid-selfcal.DP3]
[DEBUG - 2023-11-10 10:07:33,070 - mid-selfcal.DP3] Percentage of flagged visibilities detected per correlation:
[DEBUG - 2023-11-10 10:07:33,070 - mid-selfcal.DP3]   [0,0,0,0] out of 2293162752 visibilities   [0%, 0%, 0%, 0%]
[DEBUG - 2023-11-10 10:07:33,070 - mid-selfcal.DP3] 0 missing time slots were inserted
[DEBUG - 2023-11-10 10:07:33,070 - mid-selfcal.DP3]
[DEBUG - 2023-11-10 10:07:33,070 - mid-selfcal.DP3] Total DP3 time    5272.56 real      106553 user     1386.73 system
[DEBUG - 2023-11-10 10:07:33,070 - mid-selfcal.DP3]     2.2% (  116  s) MSReader
[DEBUG - 2023-11-10 10:07:33,070 - mid-selfcal.DP3]    97.7% ( 5153  s) DDECal solve.
[DEBUG - 2023-11-10 10:07:33,070 - mid-selfcal.DP3]            20.8% ( 1072  s) of it spent in predict
[DEBUG - 2023-11-10 10:07:33,070 - mid-selfcal.DP3]            63.4% ( 3267  s) of it spent in estimating gains and computing residuals
[DEBUG - 2023-11-10 10:07:33,070 - mid-selfcal.DP3]             0.0% ( 1201 ms) of it spent in writing gain solutions to disk
[INFO - 2023-11-10 10:07:36,302 - mid-selfcal] Running wsclean
[DEBUG - 2023-11-10 10:07:41,692 - mid-selfcal.wsclean] WSClean version 3.4 (2023-10-11)

...

[DEBUG - 2023-11-10 14:46:44,140 - mid-selfcal.wsclean] Inversion: 02:34:01.872550, prediction: 01:48:18.134060, deconvolution: 00:04:46.120810
[DEBUG - 2023-11-10 14:46:44,141 - mid-selfcal.wsclean] Cleaning up temporary files...
[INFO - 2023-11-10 14:46:44,760 - mid-selfcal] wsclean finished in 16748.43 seconds
[INFO - 2023-11-10 14:46:46,342 - mid-selfcal] Pipeline run: SUCCESS