Batchlet Managed Dask Clusters ============================== Batchlet provides the capability to manage dask clusters. This allows the execution of any application, which use dask for parallel computing. Batchlet determines the type of dask cluster to spin up based on the execution environment. It uses the following logic: - If run inside existing SLURM job allocation, Batchlet configures and deploys a dask cluster on Slurm using :doc:`dask_slurm_cluster`. This cluster uses the existing job resources. - If no SLURM environment is detected, Batchlet defaults to using a local dask cluster via :doc:`local_cluster`. This is ideal for development or testing purposes on a local machine. The selection logic ensures that Batchlet operates efficiently in both distributed and standalone setups, leveraging the available infrastructure. The entry-point for this selection logic is :doc:`dask_cluster_factory`. Guides ------ .. toctree:: :maxdepth: 1 dask_cluster_factory local_cluster dask_slurm_cluster .. _dask-worker-configuration-from-resources: Worker Configuration from Hardware Resources -------------------------------------------- Both :doc:`local_cluster` and :doc:`dask_slurm_cluster` use the same resource-resolution logic to determine worker configuration. The following values are resolved from available node resources and optional user inputs: * ``workers_per_node`` * ``threads_per_worker`` * ``memory_per_worker`` (effective worker ``memory_limit``) At a high level, the common logic works as follows: * CPU count per node is resolved from ``SLURM_CPUS_ON_NODE`` when available, otherwise dask's local CPU count is used. * Memory per node is resolved from ``SLURM_MEM_PER_NODE`` or ``SLURM_MEM_PER_CPU`` when available; otherwise dask's system memory limit is used. * If only part of worker configuration is given, the remaining values are inferred (for example, if only workers are set, threads are derived). * ``memory_per_worker='auto'`` divides usable memory by the computed number of workers. Shared environment configuration -------------------------------- Both clusters respect a few environment variables prefixed with ``BATCHLET_DASK_CLUSTER__`` to tweak their behaviour without changing the application code. One particularly useful option is the *memory headroom fraction*. On machines with large amounts of RAM it is easy for a Dask cluster to allocate all of the node memory to workers and leave nothing for the operating system or other non-dask processes. The headroom fraction allows you to reserve a fraction of each node's memory for such overhead; the remaining memory is the basis for the per-worker ``memory_limit`` calculation. .. list-table:: Environment variables for Dask Cluster configuration :header-rows: 1 * - Variable - Default - Description * - ``BATCHLET_DASK_CLUSTER__MEMORY_HEADROOM_FRAC`` - ``0.0`` - Fraction of each node's memory to leave unallocated. The cluster will compute usable memory as ``usable_memory_per_node * (1.0 - headroom_frac)`` and then divide that amount between the workers. Setting the value to ``0`` will allow dask cluster to use the total usable memory. Note that ``usable_memory_per_node`` can still be restricted using slurm while allocating the resources.