SKA SDP Batch E2E CLI

Once installed (or loaded into the env using spack / module load commands), this package will allow user to access ska-sdp-batch-e2e-pipeline CLI. User can view more information on the subcommand and their parameters by using the standard --help option.

Here we will explain the 2 workflows that ska-sdp-batch-e2e-pipeline is designed to handle following user scenarios:

A (human) user running the end-to-end pipeline on HPC Cluster
A SDP processing script running the end-to-end pipeline on HPC Cluster

For users running the end-to-end pipeline on HPC Cluster

For this, user should use the run subcommand.

This command takes a custom YAML config as input, which all the information necessary to run the e2e pipeline, e.g. input visibilities (MSv2), skymodels, and configurations of the stages. User can also enable or disable the stages. The details of each parameter of this YAML file is defined on this page. It also take --sdm-path as an argument which should be the path where sdm products will be written.

Example:

ska-sdp-batch-e2e-pipeline run \
--config path/to/config.yml \
--sdm-path path/to/sdm-folder

User can run ska-sdp-batch-e2e-pipeline install-config subcommand, which will write the default configuration into a YAML file in current working directory. This YAML file will not contain some required parameters (like input visibility path), so user is expected to fill in those values and pass the updated configuration to the run subcommand. An example configuration file (with all required values filled in) is present at configs/run.yml

The run subcommand assumes that all the the CLI executables of the sub-pipelines are already available in the PATH. To avoid managing the dependencies of multiple different pipelines, we recommend that you use spack for installation of all these pipelines (refer to ska-sdp-spack repository), or use the pre-installed metamodules on SKA HPC cluster.

An example script which can be used to run this pipeline on AWS is available at scripts/prod/run.sh.

For SDP processing script running the end-to-end pipeline on HPC Cluster

For this, we expect the processing scripts to call run-from-sdp subcomand. A regular user should (ideally) never use this subcommand, but its available to test even on AWS.

The pre-requisites of this user flow, and how its executed via the continuum-imaging processing script, are all described in this confluence page.

SDM integration

In run subcommand, initialise_sdm stage has been introduced to copy sky model files into sdm folder structure. Previously, each calibration source (instrumental_calibration and target_calibration) carried its own sky_model path. Now, sky models will be picked by each pipeline internally using field id.

Currently, the SDM structure looks like below:

ska-data-product.yaml        # Data product descriptor
execution-block.yaml         # Execution block information (including processing block)
sky/                         # Sky model(s)
  target/                    # Field name from execution block
    sky_model.csv            # CSV representation
    bright_sources.csv       # Bright sources
  calibrator/
    sky_model.csv            # CSV representation
    (bright_sources.csv)     # Bright sources
telmodel/                    # Cached telescope model static data
  ...
logs/
   01-bpp                    # logs and QA files by respective pipeline
    ...
calibration/                 # Calibration solutions
  gains/                     # Purpose of the calibration
    [field-id]/                # If solved for a particular field
      ...
  pointing/
    bandpass/
      ...

Sub pipelines will either be reading from SDM or writing to SDM. For example, Instrumental calibration pipeline will write gaintable at path bandpass/[field-id]/gaintable.h5parm. Similarly, Batch-Preprocess pipeline will read this gaintable internally. This means, e2e no longer orchestrate inputs apart from visiblities between two pipelines. To know more about how sdm works with individual pipeline, please refer respective pipeline documentations.

For the run-from-sdp workflow, pipeline clones SDM folder from upstream and passes to pipeline. As upstream will always have sky models according to the sdm structure, it doesnt require initialise_sdm stage to be executed and it will be skipped.