ska_sdp_batchlet.utils.dask_cluster.resources module

ska_sdp_batchlet.utils.dask_cluster.resources.get_usable_cpus_per_node()[source]

Get number of slurm cpus available per node.

It is assumed that, in a slurm resource allocation, numbers of cpus available across each allocated node is identical.

Return type:

int

Returns:

Number of usable CPUs on current node.

ska_sdp_batchlet.utils.dask_cluster.resources.get_usable_memory_per_node()[source]

Get number of bytes of memory available to use per slurm node.

It is assumed that, in a slurm resource allocation, amount of memory to use across each allocated node is identical.

Return type:

int

Returns:

Amount of memory avaiable to use, in bytes.

ska_sdp_batchlet.utils.dask_cluster.resources.parse_dask_resource_spec(resources_per_worker)[source]

Parse worker resources specification in backward compatible way. This function also ensures that the values of the keys in the dictionary are always floats.

Parameters:

resources_per_worker (dict | str | None) --

If dictionary, its a mapping of resource names and their values. If string, it should be in either of the following formats:

  • "resource1=value1,resource2=value2"

  • "resource1=value1 resource2=value2"

If resources_per_worker is None, then we attempt to get distributed.worker.resources value from dask config, and proceed with the parsing as mentioned above.

In any cases, the dict values must be convertible to float.

Return type:

dict[str, float]

Returns:

A dictionary mapping resource names (str) to their values (float). Can be an empty dictionary.

ska_sdp_batchlet.utils.dask_cluster.resources.resolve_worker_configuration(workers_per_node=None, threads_per_worker=None, memory_per_worker='auto', custom_logger=None)[source]

Calculate worker configuration based on hardware constraints.

Determines the optimal number of workers per node, threads per worker, and memory limit per worker based on SLURM node resources and user-specified constraints. If user constraints conflict with available resources, this function logs warnings and applies reasonable defaults.

Parameters:
  • logger -- Logger instance for logging calculation steps and warnings.

  • workers_per_node (int | None, default: None) -- Number of workers per node. If None, will be calculated based on other parameters or system resources.

  • threads_per_worker (int | None, default: None) -- Number of threads per worker. If None, will be calculated based on other parameters or system resources.

  • memory_per_worker (str | int | None, default: 'auto') --

    Memory limit per worker. Can be:

    • None: no explicit limit (uses all available node memory)

    • "auto": automatically calculate based on number of workers

    • str: parseable memory string (e.g., "4GB", "512MB")

    • int: memory in bytes

Return type:

tuple[int, int, int]

Returns:

A tuple of (n_workers, n_threads, memory_limit) where:

  • n_workers (int): Number of workers per node

  • n_threads (int): Number of threads per worker

  • memory_limit (int): Memory limit per worker in bytes