.. _quickstart:

**********
Quickstart
**********

Follow the instructions below to process a CASA Measurement set with the batch pre-processing pipeline
(BPP) on your local machine.

Prerequisites
=============

- Installed the SKA Batch Preprocessing Pipeline, following :ref:`installation`.
- A CASA Measurement Set, preferably no larger than a few GB.

.. note::

    LOFAR, MeerKAT and OSKAR-simulated Measurement Sets should work. Avoid VLA Datasets,
    as they are typically not regular enough to be compatible with the pipeline --
    they may contain multiple spectral windows and observed fields.

Estimated time
==============

10 minutes.

Steps
=====

Follow these steps to run the Batch Preprocessing Pipeline.

**1. Activate the environment**

    Activate the environment so that the pipeline commands are globally available:

    .. code-block:: bash

        cd <BPP_REPOSITORY>   # where you previously cloned the repository
        source .venv/bin/activate

    Verify that is is the case by running:

    .. code-block:: bash

        ska-sdp-batch-preprocess --help

**2. Create a working directory structure for the pipeline run**

    This is were you will store the configuration file and the pipeline's outputs.

    .. code-block:: bash

        cd <BASE_DIRECTORY>  # wherever you like
        mkdir bpp_tutorial
        cd bpp_tutorial

    The pipeline also needs an empty directory to store its outputs, let's create it now:

    .. code-block:: bash

        mkdir output

**3. Write a configuration file**

    The pre-processing steps to apply are defined via a YAML configuration file.
    Copy-paste the following to a file named ``config.yaml`` in the current directory.
    Here we keep it simple, just flagging a range of observing frequencies and
    averaging the data.

    .. note::

        Feel free to tweak the flagged frequency range to your particular dataset.

    .. code-block:: yaml

        steps:
            # Flag the 150.00 – 155.42 MHz band
            - step: preflagger
              frequency_ranges_mhz:
                - {start: 150.00, stop: 155.42}
            # Average visibilities in time and frequency by integer factors
            - step: averager
              timestep: 4
              freqstep: 4

**4. Run the pipeline**

    Execute the pipeline by providing the configuration file, the empty output directory and
    the input CASA Measurement Set:

    .. code-block:: bash

        ska-sdp-batch-preprocess run -c config.yaml -o output/ /path/to/my_dataset.ms


**5. Check the pipeline ran successfully**

    Once the run completes, the output directory should contain the following:

    .. code-block:: bash

        $ ls output/

        config.yaml 
        dask-report.html
        my_dataset_flagging_report.png
        my_dataset_flagging_report.zarr
        my_dataset.ms
        pipeline.log
        task-list.json

    That is:

    - The pre-processed visibilities as a CASA Measurement Set with the exact same name as the input
    - An RFI flagging report as an ``xarray`` Dataset object, with a ``.zarr`` extension
    - A summary plot of the RFI flagging report with a ``.png`` extension
    - A copy of the configuration file used for the run
    - Logs of the pipeline
    - Additional diagnostic outputs related to the execution engine Dask

Next steps
==========

To go further, you may want to:

- Read the :ref:`configuration` and learn how to use more advanced steps.
- Learn about the internals of the code: start from :ref:`pipeline_intro`.
- Process larger datasets on the AWS DP cluster, see :ref:`aws`.