Batchlet Monitoring Support

Resource monitoring

Batchlet supports monitoring capabilities through the integration with the ska-sdp-benchmark-monitor library. This enables users to track resource usage, such as CPU and memory, during pipeline execution, allowing for detailed performance analysis and optimization.

Currently ska-sdp-benchmark-monitor version 0.1.0 is supported. The documentation for ska-sdp-benchmark-monitor is present here.

Notes

  1. Batchlet does not install ska-sdp-benchmark-monitor with its python package since this is optional. If required, you can install it in the same python environment as batchlet using following command:

    pip install --extra-index-url https://artefact.skao.int/repository/pypi-internal/simple ska-sdp-benchmark-monitor==0.1.0
    

    On SKA's HPC cluster it is already available via spack's environment modules, and you just have to load the module into runtime environment.

  2. Similar to batchlet's dask cluster management, if monitoring is enabled inside a non-slurm allocation, it will default to monitoring of that single node only.

  3. Batchlet only supports monitoring using “Pre-defined benchmarking levels” in ska-sdp-benchmark-monitor. If you want more control over the parameters, you can disable batchlet's monitoring support, and use the ska-sdp-benchmark-monitor independent of the batchlet, as you would use it to monitor any other application.

For information about the available configurations of the monitoring via batchlet, see Batchlet Configuration Details.

Log Monitoring

The batchlet log monitoring system wraps a log monitor around the batch pipeline being run using batchlet. The log monitor reads the logs generated by the batch pipeline, extracts relevant information as events from the logs with the help of filter plugins, and forwards the events to attached consumer plugins. The consumer plugins proceed to forward the events to the relevant end consumers of the batch pipeline events.

For more information on Plugins, please visit Batchlet Plugins.