Batchlet Managed Dask Clusters

Batchlet provides the capability to manage dask clusters. This allows the execution of any application, which use dask for parallel computing.

Batchlet determines the type of dask cluster to spin up based on the execution environment. It uses the following logic:

  • If run inside existing SLURM job allocation, Batchlet configures and deploys a dask cluster on Slurm using DaskSlurmCluster. This cluster uses the existing job resources.

  • If no SLURM environment is detected, Batchlet defaults to using a local dask cluster via DaskLocalCluster. This is ideal for development or testing purposes on a local machine.

The selection logic ensures that Batchlet operates efficiently in both distributed and standalone setups, leveraging the available infrastructure. The entry-point for this selection logic is DaskClusterFactory.

Guides

Worker Configuration from Hardware Resources

Both DaskLocalCluster and DaskSlurmCluster use the same resource-resolution logic to determine worker configuration.

The following values are resolved from available node resources and optional user inputs:

  • workers_per_node

  • threads_per_worker

  • memory_per_worker (effective worker memory_limit)

At a high level, the common logic works as follows:

  • CPU count per node is resolved from SLURM_CPUS_ON_NODE when available, otherwise dask's local CPU count is used.

  • Memory per node is resolved from SLURM_MEM_PER_NODE or SLURM_MEM_PER_CPU when available; otherwise dask's system memory limit is used.

  • If only part of worker configuration is given, the remaining values are inferred (for example, if only workers are set, threads are derived).

  • memory_per_worker='auto' divides usable memory by the computed number of workers.

Shared environment configuration

Both clusters respect a few environment variables prefixed with BATCHLET_DASK_CLUSTER__ to tweak their behaviour without changing the application code.

One particularly useful option is the memory headroom fraction. On machines with large amounts of RAM it is easy for a Dask cluster to allocate all of the node memory to workers and leave nothing for the operating system or other non-dask processes. The headroom fraction allows you to reserve a fraction of each node's memory for such overhead; the remaining memory is the basis for the per-worker memory_limit calculation.

Environment variables for Dask Cluster configuration

Variable

Default

Description

BATCHLET_DASK_CLUSTER__MEMORY_HEADROOM_FRAC

0.0

Fraction of each node's memory to leave unallocated. The cluster will compute usable memory as usable_memory_per_node * (1.0 - headroom_frac) and then divide that amount between the workers. Setting the value to 0 will allow dask cluster to use the total usable memory. Note that usable_memory_per_node can still be restricted using slurm while allocating the resources.