.. _quickstart: ********** Quickstart ********** Follow the instructions below to process a CASA Measurement set with the batch pre-processing pipeline (BPP) on your local machine. Prerequisites ============= - Installed the SKA Batch Preprocessing Pipeline, following :ref:`installation`. - A CASA Measurement Set, preferably no larger than a few GB. .. note:: LOFAR, MeerKAT and OSKAR-simulated Measurement Sets should work. Avoid VLA Datasets, as they are typically not regular enough to be compatible with the pipeline -- they may contain multiple spectral windows and observed fields. Estimated time ============== 10 minutes. Steps ===== Follow these steps to run the Batch Preprocessing Pipeline. **1. Activate the environment** Activate the environment so that the pipeline commands are globally available: .. code-block:: bash cd # where you previously cloned the repository source .venv/bin/activate Verify that is is the case by running: .. code-block:: bash ska-sdp-batch-preprocess --help **2. Create a working directory structure for the pipeline run** This is were you will store the configuration file and the pipeline's outputs. .. code-block:: bash cd # wherever you like mkdir bpp_tutorial cd bpp_tutorial The pipeline also needs an empty directory to store its outputs, let's create it now: .. code-block:: bash mkdir output **3. Write a configuration file** The pre-processing steps to apply are defined via a YAML configuration file. Copy-paste the following to a file named ``config.yaml`` in the current directory. Here we keep it simple, just flagging a range of observing frequencies and averaging the data. .. note:: Feel free to tweak the flagged frequency range to your particular dataset. .. code-block:: yaml steps: # Flag the 150.00 – 155.42 MHz band - step: preflagger frequency_ranges_mhz: - {start: 150.00, stop: 155.42} # Average visibilities in time and frequency by integer factors - step: averager timestep: 4 freqstep: 4 **4. Run the pipeline** Execute the pipeline by providing the configuration file, the empty output directory and the input CASA Measurement Set: .. code-block:: bash ska-sdp-batch-preprocess run -c config.yaml -o output/ /path/to/my_dataset.ms **5. Check the pipeline ran successfully** Once the run completes, the output directory should contain the following: .. code-block:: bash $ ls output/ config.yaml dask-report.html my_dataset_flagging_report.png my_dataset_flagging_report.zarr my_dataset.ms pipeline.log task-list.json That is: - The pre-processed visibilities as a CASA Measurement Set with the exact same name as the input - An RFI flagging report as an ``xarray`` Dataset object, with a ``.zarr`` extension - A summary plot of the RFI flagging report with a ``.png`` extension - A copy of the configuration file used for the run - Logs of the pipeline - Additional diagnostic outputs related to the execution engine Dask Next steps ========== To go further, you may want to: - Read the :ref:`configuration` and learn how to use more advanced steps. - Learn about the internals of the code: start from :ref:`pipeline_intro`. - Process larger datasets on the AWS DP cluster, see :ref:`aws`.