SKA SDP self-calibration Workflow

This repository defines a self calibration pipeline. It relies on DP3 and WSClean. This repository is under development, and should not be used for science purposes.

Current documentation can be viewed here.

Getting started

This project is defined on the basis of this SKA template.

Project status

This project is currently a proof of concept. The intention is to replicate the functional behavior of the Rapthor pipeline, with SKA-specific adjustments, and with an SKA-specific execution environment. Currently, we know that there are functional differences. For getting the best quality images we currently advise to try rapthor.

Usage

Requirements

To run this pipeline you will need the following software installed:

  • DYSCO https://github.com/aroffringa/dysco

  • DP3 https://git.astron.nl/RD/DP3

  • WSClean https://wsclean.readthedocs.io/en/latest/installation.html

You can then proceed and install the pipeline module from this repository using ‘pip install -e .’ from the main directory.

Running the pipeline

To run the pipeline you can use the following command (adjusting the paths to DP3 and WSClean executables):

python3 src/ska_sdp_wflow_selfcal/pipeline/main.py \
--dp3_path /path/to/DP3  \
--wsclean_cmd /path/to/wsclean \
--input_ms /path/to/input/measurementset.ms \
--work_dir /path/to/desired/working_dir \
--config /path/to/config/file.yml \
--logging_tag $SLURM_JOB_ID \
--resume_from_operation calibrate_1 \
--run_single_operation False

Running at a SLURM cluster

The examples directory contains example SBATCH scripts for running the pipeline on a SLURM cluster. More detailed documentation about these scripts can be found on this RTD page.

Pipeline parameters / command line arguments

–dp3_path

This parameter supplies the path to the DP3 application. The pipeline executes DP3 in the calibrate, predict and imaging operations. When using the –dask-scheduler option, the pipeline runs multiple DP3 processes in parallel.

–wsclean_cmd

This parameter specifies the command line for running WSClean. When using a single machine, it typically contains the path to the “wsclean” executable. When running on multiple machine using MPI, it should contain a command like “mpirun -npernode 1 –bind-to none /path/to/wsclean-mp”, including the quotes.

–logging_tag

This parameter sets a tag that is included in the filenames of the log files. The main workflow log file is wflow-selfcal..log. Other log files are DP3...log, wsclean..log and dask..log. The pipeline generates separate DP3 log files for each DP3 run, which contain a description of the pipeline part in the file name besides the tag. By default, the logging tag is the process id of the workflow script.

–resume_from_operation

This parameter allows restarting the pipeline from a given operation. Choose among calibrate_1, predict_1, image_1, calibrate_2, image_2, calibrate_3, predict_3, image_3, calibrate_4, predict_4, image_4. Please note that resuming is only guaranteed to work if all the other parameters remain the same.

–run_single_operation

This parameter allows running a single operation in the pipeline, given with “–resume_from_operation”.

–dask_scheduler

This parameter provides the address of a Dask Scheduler that is connected to several Dask Workers. Providing this parameter enables distribution for calibration (DP3). Its operations then run using multiple processes and/or nodes. The Dask Scheduler and Workers are required to be running before the pipeline is started. For an example on how to start Dask, see the example sbatch scripts for CSD3 and DAS-6 in the ‘examples’ folder. If this argument is not provided, the pipeline runs all DP3 processes sequentially, without using Dask. Note that DP3 processes are still multi-threaded, so sequentially running them does not disable parallelization altogether.

–config

This parameter contains a YML configuration file which specifies the settings specific to each self calibration cycle, including the arguments for DP3 and WSClean. See the examples in the ‘config’ folder for more details.

Contribute

If you want to contribute to this project, please consult the section below.

The system used for development needs to have Python 3 and pip installed.

Installation

You can install this module via pip, using ‘pip install ska-sdp-wflow-selfcal’.

If you want to install it via git, follow the instructions below.

In order to clone and work with this repository, you need to have poetry installed. You can get it with:

curl -sSL https://install.python-poetry.org | python3 -

Clone the repository with its submodules

git clone --recursive git@gitlab.com:ska-telescope/sdp/science-pipeline-workflows/ska-sdp-wflow-selfcal.git
cd ska-sdp-wflow-selfcal
git submodule init
git submodule update

Enter poetry virtual environment and build the project

poetry shell
poetry build && poetry install

Now you can use the make instructions of the submodule:

make python-build

You can also format the code with make python-format and check the linting with make python-lint.

Install the docs

This project has also a readthedocs page. To build the documentation, first make sure that all the packages needed are installed:

poetry install --with docs

And then:

poetry run sphinx-build -b html docs/src docs/build

Or alternatively, go to the docs folder and run:

build html