# SKA SDP Instrumental Calibration Pipeline [![Documentation Status](https://readthedocs.org/projects/ska-telescope-ska-sdp-instrumental-calibration/badge/?version=latest)](https://ska-telescope-ska-sdp-instrumental-calibration.readthedocs.io/en/latest/?badge=latest) The **Instrumental Calibration Pipeline (INST)** is a cli application which is used to perform calibration operations on the SKA visibility data. It mainly provides Instrumental calibration pipeline and Taget calibration pipeline. This repository contains the functions to generate the initial calibration products during standard SKA batch processing. It includes processing functions to prepare, model and calibrate a visibility dataset, data handling functions for parallel processing, and high level workflow scripts and notebooks. If you wish to contribute to this repository, please refer [Developer Guide](https://developer.skao.int/projects/ska-sdp-instrumental-calibration/en/latest/DEVELOPMENT.html) ## Requirements for running the pipeline The INST pipeline is primarily dependent on these external astronomy related libraries: 1. [python-casacore](https://github.com/casacore/python-casacore) 2. [everybeam](https://git.astron.nl/RD/EveryBeam) Apart from above, the pipeline uses standard SKA processing functions in the 1. [sdp-func-python](https://developer.skao.int/projects/ska-sdp-func-python/en/) 2. [sdp-func](https://developer.skao.int/projects/ska-sdp-func/en/) (optional) and SKA standard data models in the [ska-sdp-datamodels](https://developer.skao.int/projects/ska-sdp-datamodels/en/) repository. All above dependencies are installed along with the pipeline using [standard installation steps](#installing-the-pipeline). For prediction of model visibilities, here are the pre-requisites: * The GLEAM extragalactic catalogue or a [OSKAR](https://ska-telescope.gitlab.io/sim/oskar/sky_model/sky_model.html#sky-model-file-fixed-format) csv file. This and other catalogues will soon be available via [global-sky-model](https://developer.skao.int/projects/ska-sdp-global-sky-model/en/), but at present a hard copy is needed for prediction of model visibilities. The gleamegc catalogue can be downloaded via FTP from [VizieR](https://cdsarc.cds.unistra.fr/viz-bin/cat/VIII/100). * A measurement set with appropriate metadata to initialise the everybeam beam models. An appropriate measurement set for basic tests can be downloaded using the [everybeam package](https://gitlab.com/ska-telescope/sdp/ska-sdp-func-everybeam/) script `download_ms.sh`, but one will also be made available in this package. * The [everybeam coeffs](https://gitlab.com/ska-telescope/sdp/ska-sdp-func-everybeam/-/tree/master/coeffs) directory is also needed to generate beam models. The directory path supplied to `predict_from_components` is used to set environment variable `EVERYBEAM_DATADIR`. ## Foreword: Dask distribution > This section is inspired from the [batch-preprocessing pipeline](https://developer.skao.int/projects/ska-sdp-batch-preprocess/en/latest/pipelines.html#foreword-dask-distribution) > This is only applicable with you run INST pipeline with a dask cluster (LocalCluster or SlurmCluster) The INST pipeline in the `load_data` stage, converts the MSv2 into a Zarr file, and stores it in the `cache_directory` path. During the testing, we have realised that its better to limit the number of parallel tasks that run during the conversion from MSv2 to Zarr, so that each task can get enought memory. The only reliable solution is to use [worker resources](https://distributed.dask.org/en/latest/resources.html#worker-resources). The instrumental calibration assumes that all workers define a resource called `process`; each worker may hold 1 or more `process` resources. Each task of the conversion is defined to use 1 `process` resource. Thus each worker will only run `process` number of tasks at any time (parallel/concurrent using its threadpool.) To define the reources when starting dask worker using the cli command: ```bash dask worker --resources "process=1" ``` Or in a LocalCluster: ```python cluster = LocalCluster(resources={'process': 1}) ``` > ⚠️ Warning: > If the process resource is not defined on any worker, the pipeline (or rather, the Dask scheduler) will hang indefinitely. ## Installing the pipeline ### In python environments It is always recommended to create a seperate python environment for the pipeline. For that, you can use `conda` or [uv](https://docs.astral.sh/uv/getting-started/installation/) ```bash # To create a virtual environment using uv # This will be created in the `.venv` directory uv venv --python 3.10 --seed #To activate the environment source .venv/bin/activate ``` #### Stable release from SKAO pip index (recommended) Run the following command to install the latest stable release (0.7.1) of the pipeline from SKAO python artifact repository: ```bash INST_VERSION=0.7.1 # if using uv, use `uv pip install ...` pip install --extra-index-url "https://artefact.skao.int/repository/pypi-internal/simple" "ska-sdp-instrumental-calibration[python-casacore,ska-sdp-func]==$INST_VERSION" ``` #### Latest pipeline from git Run the following command to install the latest pipeline from the `main` branch of the git repository ```bash INST_BRANCH=main # if using uv, use `uv pip install ...` pip install --extra-index-url "https://artefact.skao.int/repository/pypi-internal/simple" "ska-sdp-instrumental-calibration[python-casacore,ska-sdp-func]@git+https://gitlab.com/ska-telescope/sdp/science-pipeline-workflows/ska-sdp-instrumental-calibration.git@$INST_BRANCH" ``` ### As spack package The INST pipeline is available as a [spack](https://spack.readthedocs.io/en/v0.23.1/) package, in the [ska-sdp-spack](https://gitlab.com/ska-telescope/sdp/ska-sdp-spack) repository. Please follow the [README](https://gitlab.com/ska-telescope/sdp/ska-sdp-spack/-/blob/main/README.md) to setup spack on your machine. Then install the e2e pipeline using this command: ```bash INST_VERSION=0.7.1 spack install "py-ska-sdp-instrumental-calibration@$INST_VERSION" ``` Then load the spack package with this command ```bash spack load "py-ska-sdp-instrumental-calibration" ``` ### As OCI container We also provide a OCI (docker) image which is hosted on the SKA Docker artifact repository. To pull the docker image for the latest stable release, please run: ```bash INST_VERSION=0.7.1 docker pull "artefact.skao.int/ska-sdp-instrumental-calibration:$INST_VERSION" ``` The entrypoint of above image is set to the executable `ska-sdp-instrumental-calibration`. Run image with volume mounts to enable read write to storage. ```bash docker run [-v local:container] ...... ``` ## Using the CLI for Instrumental Calibration pipeline Once you install the pipeline, you should be able to access the pipeline cli with `ska-sdp-instrumental-calibration` command. Running `ska-sdp-instrumental-calibration --help` should show following output: ```bash usage: ska-sdp-instrumental-calibration [-h] {run,install-config,experimental} ... positional arguments: {run,install-config,experimental} run Run the pipeline install-config Installs the default config at --config-install-path experimental Allows reordering of stages via additional config section options: -h, --help show this help message and exit ``` ### Generating YAML config of the pipeline The INST pipeline expects a YAML config file as one of the inputs, which defines the stages and their parameters. The information about stages is present in the documentation Install the default config YAML of the pipeline to a specific directory using the `install-config` subcommand. ```bash ska-sdp-instrumental-calibration install-config --config-install-path path/to/dir ``` Parameters of the default configurations can be overridden ```bash ska-sdp-instrumental-calibration install-config --config-install-path path/to/dir \ --set parameters.bandpass_calibration.flagging true \ --set parameters.load_data.fchunk 64 ``` ### Running the pipeline Run the instrumental calibration pipeline using `run` subcommand. Example: ```bash ska-sdp-instrumental-calibration run \ --config /path/to/config \ --output /path/to/output/dir \ /path/to/ms ``` Please run `ska-sdp-instrumental-calibration run --help` to see all supported options of the `run` subcommand.\ ### Reordering stages in pipeline Run the instrumental calibration pipeline using `experimental` subcommand, to provide alternate stage order than the default order. Example: ```bash ska-sdp-instrumental-calibration experimental \ --config /path/to/config \ --output /path/to/output/dir \ /path/to/ms ``` The configuration is used to control both the execution order and any additional settings for each stage. The `experimental` subcommand allows reuse of the same stage multiple times. #### Experimental Configuration ```yaml global_parameters: experimental: pipeline: - load_data: {} - predict_vis: use_everybeam: true flux_limit: 2.0 fov: 5.0 - bandpass_calibration: {} - delay_calibration: {} - generate_channel_rm: run_solver_config: solver: normal_equations refant: 0 niter: 30 - delay_calibration: {} - export_gain_table: {} parameters: bandpass_calibration: plot_config: plot_table: true fixed_axis: false run_solver_config: solver: gain_substitution refant: 0 niter: 10 delay_calibration: oversample: 16 plot_config: plot_table: false fixed_axis: false export_gain_table: file_name: inst.gaintable export_format: h5parm export_metadata: false load_data: nchannels_per_chunk: 32 pipeline: {} ``` The pipeline defined under `global_parameters.experimental.pipeline` will be used to construct the execution pipeline. It will consist of the following stages in the order: (1) `load_data` (2) `predict_vis` (3) `bandpass_calibration` (4) `delay_calibration` (5) `generate_channel_rm` (6) `delay_calibration_1` and (7) `export_gain_table`. There is no stage specific validations done while constructing the execution order, hence the user should pay special attention to stage order. The pipeline would not function if `load_data` stage is not set as the first stage. **Stage Names**: The `ska-sdp-instrumental-calibration experimental` feature will update the stage names for duplicated stages as follows: the first occurrence of a stage name will remain unchanged, without a suffix. Subsequent duplicates will be renamed using the format `_x`, where `x` is the duplicate index starting from 1. For example, the second occurrence of `delay_calibration` will be renamed to `delay_calibration_1`. This numbering is automatically incremented for each duplicate, preserving the order as defined in the `global_parameters.experimental.pipeline` section. This approach ensures that each stage has a unique and identifiable name. **The stage configurations** have the following precedence (from highest to lowest): 1. `--set` cli parameter 2. Configuration provided under the `global_parameters.experimental.pipeline.` section 3. Configuration provided under `parameters.` section 4. The default configurations used for the stage definitions. While using the `--set` cli-option, please be mindful of the _suffix appended to the stage name_. Example: `ska-sdp-instrumental-calibration experimental ... --set parameters.delay_calibration_1.plot_config.plot_table true` Please note that the `pipeline` section is intentionally left blank and would be ignored for the `ska-sdp-instrumental-calibration experimental` feature, as the stage execution order is decided from `global_parameters.experimental.pipeline` section. ## Using the CLI for Target Calibration pipeline Instrumental Calibration Pipeline for Target has two CLIs `ska-sdp-instrumental-target-calibration` and `ska-sdp-instrumental-target-ionospheric` commands which performs complex gain and ionospheric delay correction on Target respectively. Running `ska-sdp-instrumental-target-calibration --help` should show following output: ```bash usage: ska-sdp-instrumental-target-calibration [-h] {run,install-config} ... positional arguments: {run,install-config} run Run the pipeline install-config Installs the default config at --config-install-path options: -h, --help show this help message and exit ``` Running `ska-sdp-instrumental-target-ionospheric --help` should show following output: ```bash usage: ska-sdp-instrumental-target-ionospheric [-h] {run,install-config} ... positional arguments: {run,install-config} run Run the pipeline install-config Installs the default config at --config-install-path options: -h, --help show this help message and exit ``` > Generating YAML config of the pipeline and running the pipeline is same as instrumental calibration pipeline. Target calibration pipeline doesn't provide experimental configuration. ## Calibation Strategy for Calibrator Refer this page for [calibration strategy](https://developer.skao.int/projects/ska-sdp-instrumental-calibration/en/latest/calibration_strategy.html)