Quickstart

Follow the instructions below to process a CASA Measurement set with the batch pre-processing pipeline (BPP) on your local machine.

Prerequisites

  • Installed the SKA Batch Preprocessing Pipeline, following Installation.

  • A CASA Measurement Set, preferably no larger than a few GB.

Note

LOFAR, MeerKAT and OSKAR-simulated Measurement Sets should work. Avoid VLA Datasets, as they are typically not regular enough to be compatible with the pipeline – they may contain multiple spectral windows and observed fields.

Estimated time

10 minutes.

Steps

Follow these steps to run the Batch Preprocessing Pipeline.

1. Activate the environment

Activate the environment so that the pipeline commands are globally available:

cd <BPP_REPOSITORY>   # where you previously cloned the repository
source .venv/bin/activate

Verify that is is the case by running:

ska-sdp-batch-preprocess --help

2. Create a working directory structure for the pipeline run

This is were you will store the configuration file and the pipeline’s outputs.

cd <BASE_DIRECTORY>  # wherever you like
mkdir bpp_tutorial
cd bpp_tutorial

The pipeline also needs an empty directory to store its outputs, let’s create it now:

mkdir output

3. Write a configuration file

The pre-processing steps to apply are defined via a YAML configuration file. Copy-paste the following to a file named config.yaml in the current directory. Here we keep it simple, just flagging a range of observing frequencies and averaging the data.

Note

Feel free to tweak the flagged frequency range to your particular dataset.

steps:
    # Flag the 150.00 – 155.42 MHz band
    - step: preflagger
      frequency_ranges_mhz:
        - {start: 150.00, stop: 155.42}
    # Average visibilities in time and frequency by integer factors
    - step: averager
      timestep: 4
      freqstep: 4

4. Run the pipeline

Execute the pipeline by providing the configuration file, the empty output directory and the input CASA Measurement Set:

ska-sdp-batch-preprocess run -c config.yaml -o output/ /path/to/my_dataset.ms

5. Check the pipeline ran successfully

Once the run completes, the output directory should contain the following:

$ ls output/

config.yaml
dask-report.html
my_dataset_flagging_report.png
my_dataset_flagging_report.zarr
my_dataset.ms
pipeline.log
task-list.json

That is:

  • The pre-processed visibilities as a CASA Measurement Set with the exact same name as the input

  • An RFI flagging report as an xarray Dataset object, with a .zarr extension

  • A summary plot of the RFI flagging report with a .png extension

  • A copy of the configuration file used for the run

  • Logs of the pipeline

  • Additional diagnostic outputs related to the execution engine Dask

Next steps

To go further, you may want to: