Batchlet Managed Dask Clusters
Batchlet provides the capability to manage dask clusters. This allows the execution of any application, which use dask for parallel computing.
Batchlet determines the type of dask cluster to spin up based on the execution environment. It uses the following logic:
If run inside existing SLURM job allocation, Batchlet configures and deploys a dask cluster on Slurm using DaskSlurmCluster. This cluster uses the existing job resources.
If no SLURM environment is detected, Batchlet defaults to using a local dask cluster via DaskLocalCluster. This is ideal for development or testing purposes on a local machine.
The selection logic ensures that Batchlet operates efficiently in both distributed and standalone setups, leveraging the available infrastructure. The entry-point for this selection logic is DaskClusterFactory.
Guides
Worker Configuration from Hardware Resources
Both DaskLocalCluster and DaskSlurmCluster use the same resource-resolution logic to determine worker configuration.
The following values are resolved from available node resources and optional user inputs:
workers_per_nodethreads_per_workermemory_per_worker(effective workermemory_limit)
At a high level, the common logic works as follows:
CPU count per node is resolved from
SLURM_CPUS_ON_NODEwhen available, otherwise dask's local CPU count is used.Memory per node is resolved from
SLURM_MEM_PER_NODEorSLURM_MEM_PER_CPUwhen available; otherwise dask's system memory limit is used.If only part of worker configuration is given, the remaining values are inferred (for example, if only workers are set, threads are derived).
memory_per_worker='auto'divides usable memory by the computed number of workers.