# SKA SDP Batch E2E CLI

Once installed (or loaded into the env using spack / module load commands), this package will allow user to access `ska-sdp-batch-e2e-pipeline` CLI.  User can view more information on the subcommand and their parameters by using the standard `--help` option.

Here we will explain the 2 workflows that `ska-sdp-batch-e2e-pipeline` is designed to handle following user scenarios:

1. A (human) user running the end-to-end pipeline on HPC Cluster

2. A SDP processing script running the end-to-end pipeline on HPC Cluster

## For users running the end-to-end pipeline on HPC Cluster

For this, user should use the `run` subcommand.

This command takes a custom YAML config as input, which all the information necessary to run the e2e pipeline, e.g. input visibilities (MSv2), skymodels, and configurations of the stages. User can also enable or disable the stages.
The details of each parameter of this YAML file is defined [on this page](run_config). It also take
`--sdm-path` as an argument which should be the path where sdm products will be written.

Example:
```bash
ska-sdp-batch-e2e-pipeline run \
--config path/to/config.yml \
--sdm-path path/to/sdm-folder
```

User can run `ska-sdp-batch-e2e-pipeline install-config` subcommand, which will write the default configuration into a YAML file in current working directory. This YAML file will not contain some required parameters (like input visibility path), so user is expected to fill in those values and pass the updated configuration to the `run` subcommand. An example configuration file (with all required values filled in) is present at [configs/run.yml](https://gitlab.com/ska-telescope/sdp/science-pipeline-workflows/ska-sdp-e2e-batch-continuum-imaging/-/blob/main/configs/run.yml)

The `run` subcommand assumes that all the the CLI executables of the sub-pipelines are already available in the `PATH`.
To avoid managing the dependencies of multiple different pipelines, we recommend that you use `spack` for installation of all these pipelines (refer to [ska-sdp-spack repository](https://gitlab.com/ska-telescope/sdp/ska-sdp-spack)), or use the pre-installed `metamodules` on SKA HPC cluster.

An example script which can be used to run this pipeline on AWS is available at [scripts/prod/run.sh](https://gitlab.com/ska-telescope/sdp/science-pipeline-workflows/ska-sdp-e2e-batch-continuum-imaging/-/blob/main/scripts/prod/run.sh).

## For SDP processing script running the end-to-end pipeline on HPC Cluster

For this, we expect the processing scripts to call `run-from-sdp` subcomand. A regular user should (ideally) never use this subcommand, but its available to test even on AWS.

The pre-requisites of this user flow, and how its executed via the `continuum-imaging` processing script, are all described in [this confluence page](https://confluence.skatelescope.org/display/SE/SDP+Aware+end-to-end+pipeline).

## SDM integration

In `run` subcommand, `initialise_sdm` stage has been introduced to copy sky model files into sdm folder structure. Previously, each calibration source (`instrumental_calibration` and `target_calibration`) carried its own `sky_model` path. Now, sky models will be picked by each pipeline internally using field id.

Currently, the SDM structure looks like below:

```
ska-data-product.yaml        # Data product descriptor
execution-block.yaml         # Execution block information (including processing block)
sky/                         # Sky model(s)
  target/                    # Field name from execution block
    sky_model.csv            # CSV representation
    bright_sources.csv       # Bright sources
  calibrator/
    sky_model.csv            # CSV representation
    (bright_sources.csv)     # Bright sources
telmodel/                    # Cached telescope model static data
  ...
logs/
   01-bpp                    # logs and QA files by respective pipeline
    ...
calibration/                 # Calibration solutions
  gains/                     # Purpose of the calibration
    [field-id]/                # If solved for a particular field
      ...
  pointing/
    bandpass/
      ...
```
Sub pipelines will either be reading from SDM or writing to SDM. For example, Instrumental calibration pipeline will write gaintable at path `bandpass/[field-id]/gaintable.h5parm`. Similarly, Batch-Preprocess pipeline will read this gaintable internally. This means, e2e no longer orchestrate inputs apart from visiblities between two pipelines. To know more about how sdm works with individual pipeline, please refer respective pipeline documentations.

For the `run-from-sdp` workflow, pipeline clones SDM folder from upstream and passes to pipeline. As upstream will always have sky models according to the sdm structure, it doesnt require `initialise_sdm` stage to be executed and it will be skipped.