Running on a Local Dask Cluster

The pipeline can leverage a dask cluster with multiple workers to run operations in distributed manner.

Once you install the pipeline and its dependencies in your local python environment, you can use dask CLI to run a local cluster.

Make sure that you run scheduler and workers using the same python environment where the pipeline is installed.

Start a dask scheduler as:

dask scheduler

This starts the dask scheduler on (default) port 8786.

In another terminal window, start workers as:

dask worker localhost:8786 --nworkers <NUM_WORKERS>

If you want fine control over workers, you can start seperate workers in seperate terminal windows.

Then run the pipeline whlie specifying the scheduler IP as:

ska-sdp-spectral-line-imaging run \
--input /path/to/processing_set.ps \
--config config.yml \
--dask-scheduler localhost:8786