Example DD self-calibration on AA2 Mid
Setup and Parameters
Dataset
We ran version 0.5.1 of the DD selfcal pipeline as a SLURM job on the CSD3 cluster, using the AA2 Mid simulated dataset produced by team HiPPo in PI18. We used a small-sized version of the dataset that includes mild corrupting gains, has 64 frequency channels, a 4-hour observing time and the innermost 62 antennas only (this excludes the two “outlier” antennas, and brings the maximum baseline length down to 34.2 km). The total uncompressed size of the dataset on disk is 72 GB.
Input parameters
To bootstrap calibration, we used the AA2 Mid sky model file provided in the data
directory
of the repository, which contains the parameters the top 100 brightest sources in the field with
their apparent fluxes (i.e. primary-beam attenuated), which is what the pipeline expects.
We chose a pixel scale of 0.3 arcseconds, which corresponds to about 6 pixels across the Airy disk FWHM at 950 MHz for a maximum baseline length of 34.2 km. We requested a 24,000 pixel squared image for a total field of view of 2 degrees.
Configuration file
We performed a single self-calibration cycle with just a scalar phase + amplitude calibration. Note that the solution interval in frequency is one channel, as the simulated corrupting gains injected into the data are independent in every channel.
The imaging field was split in 9 calibration patches / facets using the voronoi_brightest
strategy,
i.e. the facets were defined as the Voronoi cells around the 9 brightest sources.
We used the default weighting mode set by the pipeline, which is Briggs weighting with a robustness parameter of +0.5.
selfcal_cycles:
# Cycle 1
- tesselation:
method: voronoi_brightest
num_patches: 9
ddecal:
solve.mode: scalar
solve.solint: 30
solve.nchan: 1
solve.propagatesolutions: true
solve.propagateconvergedonly: true
solve.solveralgorithm: hybrid
imaging:
flags: []
options:
"-niter": 50_000
"-mgain": 0.7
SLURM job script
For this example run we used 16 logical CPU cores on a single icelake node of the CSD3 computing facility:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=76
#SBATCH --time=24:00:00
#SBATCH --partition=icelake
#SBATCH --signal=B:TERM@600
#SBATCH --account=SKA-SDHP-SL2-CPU
export BASE_DIR=...
# NOTE: glob patterns must be enclosed in double quotes
export MS=$BASE_DIR/MS_AA2-Mid_rev4_corrupted_62_stations_4.0h_0064ch
export SIF=$BASE_DIR/dp3-wsclean-mpi.sif
export OUTDIR=$BASE_DIR/output
export CONFIG=$BASE_DIR/config_9facets_voronoi_brightest.yml
export SKYMODEL=$BASE_DIR/aa2_mid_pb_attenuated_top100.skymodel
export NPIX=24000
export SCALE=0.3
# Load environment
source ~/.bashrc
conda activate selfcal
# Make outdir if necessary
mkdir -p $OUTDIR
# "exec" so that SIGTERM propagates to the pipeline executable
exec mid-selfcal-dd --base-outdir $OUTDIR --config $CONFIG --singularity-image $SIF --num-pixels $NPIX --pixel-scale $SCALE --sky-model $SKYMODEL $MS
Images produced
Plotting with DS9
At the start of each self-calibration cycle, the pipeline saves both the skymodel and the calibration patch (facet) parameters to DS9 region files, so that they can be overlaid when loading the output images in DS9.
In the examples below we used the command:
ds9 -scale mode zscale -cmap cool -region patches01.reg -region sources01.reg <FITS_FILE>
Result
Below is the clean image returned. The calibration patches and the top 100 brightest sources from the original sky model are overlaid:
Run times
Running on a single CSD3 icelake node and using 76 logical cores, the overall run time was approximately 6 hours.
Here’s an abridged version of the pipeline logs providing a breakdown of the run times for each program:
[INFO - 2023-11-10 08:39:34,287 - mid-selfcal] Running version: 0.5.1
[DEBUG - 2023-11-10 08:39:34,880 - mid-selfcal] Imaging centre: 00h00m00.00s -45d00m00.00s
[DEBUG - 2023-11-10 08:39:35,082 - mid-selfcal] patch_00 encloses 10 sources and a total flux of 534.11 mJy
[DEBUG - 2023-11-10 08:39:35,082 - mid-selfcal] patch_01 encloses 9 sources and a total flux of 60.76 mJy
[DEBUG - 2023-11-10 08:39:35,082 - mid-selfcal] patch_02 encloses 5 sources and a total flux of 49.39 mJy
[DEBUG - 2023-11-10 08:39:35,082 - mid-selfcal] patch_03 encloses 4 sources and a total flux of 44.62 mJy
[DEBUG - 2023-11-10 08:39:35,082 - mid-selfcal] patch_04 encloses 9 sources and a total flux of 54.19 mJy
[DEBUG - 2023-11-10 08:39:35,082 - mid-selfcal] patch_05 encloses 15 sources and a total flux of 80.61 mJy
[DEBUG - 2023-11-10 08:39:35,082 - mid-selfcal] patch_06 encloses 3 sources and a total flux of 23.47 mJy
[DEBUG - 2023-11-10 08:39:35,082 - mid-selfcal] patch_07 encloses 33 sources and a total flux of 118.72 mJy
[DEBUG - 2023-11-10 08:39:35,082 - mid-selfcal] patch_08 encloses 11 sources and a total flux of 40.92 mJy
...
[DEBUG - 2023-11-10 08:39:41,481 - mid-selfcal.DP3] Processing 18948 time slots ...
[DEBUG - 2023-11-10 08:39:41,481 - mid-selfcal.DP3]
[DEBUG - 2023-11-10 10:07:27,270 - mid-selfcal.DP3] 0%....10....20....30....40....50....60....70....80....90....100%
[DEBUG - 2023-11-10 10:07:33,069 - mid-selfcal.DP3] Finishing processing ...
[DEBUG - 2023-11-10 10:07:33,069 - mid-selfcal.DP3]
[DEBUG - 2023-11-10 10:07:33,069 - mid-selfcal.DP3] NaN/infinite data flagged in reader
[DEBUG - 2023-11-10 10:07:33,069 - mid-selfcal.DP3] ===================================
[DEBUG - 2023-11-10 10:07:33,069 - mid-selfcal.DP3]
[DEBUG - 2023-11-10 10:07:33,070 - mid-selfcal.DP3] Percentage of flagged visibilities detected per correlation:
[DEBUG - 2023-11-10 10:07:33,070 - mid-selfcal.DP3] [0,0,0,0] out of 2293162752 visibilities [0%, 0%, 0%, 0%]
[DEBUG - 2023-11-10 10:07:33,070 - mid-selfcal.DP3] 0 missing time slots were inserted
[DEBUG - 2023-11-10 10:07:33,070 - mid-selfcal.DP3]
[DEBUG - 2023-11-10 10:07:33,070 - mid-selfcal.DP3] Total DP3 time 5272.56 real 106553 user 1386.73 system
[DEBUG - 2023-11-10 10:07:33,070 - mid-selfcal.DP3] 2.2% ( 116 s) MSReader
[DEBUG - 2023-11-10 10:07:33,070 - mid-selfcal.DP3] 97.7% ( 5153 s) DDECal solve.
[DEBUG - 2023-11-10 10:07:33,070 - mid-selfcal.DP3] 20.8% ( 1072 s) of it spent in predict
[DEBUG - 2023-11-10 10:07:33,070 - mid-selfcal.DP3] 63.4% ( 3267 s) of it spent in estimating gains and computing residuals
[DEBUG - 2023-11-10 10:07:33,070 - mid-selfcal.DP3] 0.0% ( 1201 ms) of it spent in writing gain solutions to disk
[INFO - 2023-11-10 10:07:36,302 - mid-selfcal] Running wsclean
[DEBUG - 2023-11-10 10:07:41,692 - mid-selfcal.wsclean] WSClean version 3.4 (2023-10-11)
...
[DEBUG - 2023-11-10 14:46:44,140 - mid-selfcal.wsclean] Inversion: 02:34:01.872550, prediction: 01:48:18.134060, deconvolution: 00:04:46.120810
[DEBUG - 2023-11-10 14:46:44,141 - mid-selfcal.wsclean] Cleaning up temporary files...
[INFO - 2023-11-10 14:46:44,760 - mid-selfcal] wsclean finished in 16748.43 seconds
[INFO - 2023-11-10 14:46:46,342 - mid-selfcal] Pipeline run: SUCCESS