SKA SDP self-calibration Workflow
This repository defines a self calibration pipeline. It relies on DP3 and WSClean. This repository is under development, and should not be used for science purposes.
Current documentation can be viewed here.
Getting started
This project is defined on the basis of this SKA template.
Project status
This project is currently a proof of concept. The intention is to replicate the functional behavior of the Rapthor pipeline, with SKA-specific adjustments, and with an SKA-specific execution environment. Currently, we know that there are functional differences. For getting the best quality images we currently advise to try rapthor.
Usage
Requirements
To run this pipeline you will need the following software installed:
DYSCO https://github.com/aroffringa/dysco
DP3 https://git.astron.nl/RD/DP3
WSClean https://wsclean.readthedocs.io/en/latest/installation.html
You can then proceed and install the pipeline module from this repository using ‘pip install -e .’ from the main directory.
Running the pipeline
To run the pipeline you can use the following command (adjusting the paths to DP3 and WSClean executables):
python3 src/ska_sdp_wflow_selfcal/pipeline/main.py \
--dp3_path /path/to/DP3 \
--wsclean_cmd /path/to/wsclean \
--input_ms /path/to/input/measurementset.ms \
--work_dir /path/to/desired/working_dir \
--config /path/to/config/file.yml \
--logging_tag $SLURM_JOB_ID \
--resume_from_operation calibrate_1 \
--run_single_operation False
Running at a SLURM cluster
The examples directory contains example SBATCH scripts for running the pipeline on a SLURM cluster. More detailed documentation about these scripts can be found on this RTD page.
Pipeline parameters / command line arguments
–dp3_path
This parameter supplies the path to the DP3 application. The pipeline executes DP3 in the calibrate, predict and imaging operations. When using the –dask-scheduler option, the pipeline runs multiple DP3 processes in parallel.
–wsclean_cmd
This parameter specifies the command line for running WSClean. When using a single machine, it typically contains the path to the “wsclean” executable. When running on multiple machine using MPI, it should contain a command like “mpirun -npernode 1 –bind-to none /path/to/wsclean-mp”, including the quotes.
–logging_tag
This parameter sets a tag that is included in the filenames of the log files. The main workflow log file is wflow-selfcal.
–resume_from_operation
This parameter allows restarting the pipeline from a given operation. Choose among calibrate_1, predict_1, image_1, calibrate_2, image_2, calibrate_3, predict_3, image_3, calibrate_4, predict_4, image_4. Please note that resuming is only guaranteed to work if all the other parameters remain the same.
–run_single_operation
This parameter allows running a single operation in the pipeline, given with “–resume_from_operation”.
–dask_scheduler
This parameter provides the address of a Dask Scheduler that is connected to several Dask Workers. Providing this parameter enables distribution for calibration (DP3). Its operations then run using multiple processes and/or nodes. The Dask Scheduler and Workers are required to be running before the pipeline is started. For an example on how to start Dask, see the example sbatch scripts for CSD3 and DAS-6 in the ‘examples’ folder. If this argument is not provided, the pipeline runs all DP3 processes sequentially, without using Dask. Note that DP3 processes are still multi-threaded, so sequentially running them does not disable parallelization altogether.
–config
This parameter contains a YML configuration file which specifies the settings specific to each self calibration cycle, including the arguments for DP3 and WSClean. See the examples in the ‘config’ folder for more details.
Contribute
If you want to contribute to this project, please consult the section below.
The system used for development needs to have Python 3 and pip installed.
Installation
You can install this module via pip, using ‘pip install ska-sdp-wflow-selfcal’.
If you want to install it via git, follow the instructions below.
In order to clone and work with this repository, you need to have poetry installed. You can get it with:
curl -sSL https://install.python-poetry.org | python3 -
Clone the repository with its submodules
git clone --recursive git@gitlab.com:ska-telescope/sdp/science-pipeline-workflows/ska-sdp-wflow-selfcal.git
cd ska-sdp-wflow-selfcal
git submodule init
git submodule update
Enter poetry virtual environment and build the project
poetry shell
poetry build && poetry install
Now you can use the make instructions of the submodule:
make python-build
You can also format the code with make python-format and check the linting with make python-lint.
Install the docs
This project has also a readthedocs page. To build the documentation, first make sure that all the packages needed are installed:
poetry install --with docs
And then:
poetry run sphinx-build -b html docs/src docs/build
Or alternatively, go to the docs folder and run:
build html