Launching a Dask Cluster

The direction-dependent pipeline can leverage a dask cluster with multiple nodes to distribute the calibration stage. In order to use that feature, you most likely will have to spin up said dask cluster before starting the pipeline.

Note

The pipeline relies on the assumption that the workers define a custom dask resource called subprocess_slots, and it must be set to 1. See below for an explanation.

Testing on a local machine

You may want to do some development or testing work on your own desktop computer, in which case launching a dask cluster manually is the straightforward method.

In one terminal window, activate the same python environment as the pipeline. Then launch:

dask scheduler

This starts the dask scheduler on port 8786. Now onto launching the workers. In another terminal window, activate the same python environment as the pipeline. Then launch:

dask worker localhost:8786 --resources subprocess_slots=1 --nworkers <NUM_WORKERS>

You may adjust other parameters as desired, such as the number of threads per worker; by default, all the available threads are split evenly between workers.

On an HPC cluster

In this case, you should start the dask cluster within the batch script you intend to submit, before starting the pipeline. Please refer to the Running on SLURM page for further details.

Why the subprocess_slots resource ?

The direction-dependent pipeline currently uses dask workers to run DP3 subprocesses on time chunks of the data. Because this code runs as a separate process from the dask worker that launches it, the worker does not detect the associated CPU and RAM load incurred. As a result, the dask scheduler assumes all workers are always idle and immediately sends all queued tasks to the workers.

The solution we adopted was to let the scheduler know that each worker is only allowed to run a single subprocess at a given time. In the pipeline code, DP3 tasks are tagged as occupying 1 unit of a custom subprocess_slots resource, in which case the scheduler works as expected; all DP3 tasks can be submitted upfront, and the scheduler optimally manages the rest.

The only constraint is that the workers must be launched with --resources subprocess_slots=1.