ska_sdp_batchlet.utils.dask_cluster.slurm.worker module

class ska_sdp_batchlet.utils.dask_cluster.slurm.worker.SlurmWorkers(scheduler_address, n_workers, nthreads, memory_limit='auto', name=None, worker_class='distributed.Nanny', use_entry_node=True, **worker_kwargs)[source]

Bases: ProcessInterface

A class representing a group of dask workers ran on slurm. Each "SlurmWorkers" runs on a single node, in which it can start multiple individual dask worker processes.

Parameters:
  • scheduler_address (str) -- Address of dask scheduler

  • n_workers (int) -- Number of dask workers (processes) launched per slurm node

  • nthreads (int) -- Threads per dask worker

  • memory_limit (str) -- Memory limit per dask worker Must be a string, in a format expected by the dask worker cli. e.g. '10 GiB', '1 TB'

  • name (str, optional) -- Name of the dask worker group. This will act as prefix in case n_workers is greater than 1. This will also be the name of the slurm job step which starts the dask workers on a node.

  • worker_class (str, default = "distributed.Nanny") -- The python class to use to create the worker(s).

  • use_entry_node (bool, default = True) -- Whether to include the node running dask scheduler to also launch dask workers. By default, it is assumed that scheduler is running on the first node out of allocated slurm nodes. If use_entry_node is true, dask workers are also launched on the first node. If false, dask workers are not launched on the first node.

  • worker_kwargs (dict) -- Extra worker kwargs. Accepts same kwargs as the distributed.Nanny class.

async start()[source]

Launch slurm based dask workers

async close()[source]

Close the process

This will be called by the Cluster object when we scale down a node, but only after we ask the Scheduler to close the worker gracefully. This method should kill the process a bit more forcefully and does not need to worry about shutting down gracefully