Batchlet
The batchlet is a tool specifically made to run dask based batch processing pipelines.
Batchlet acts as a wrapper tool over the pipeline process, and provides following abilities:
Managing the dask cluster: Used by the pipeline to perform dask-based computations. See Batchlet Managed Dask Clusters section for more info.
Monitoring resources and logs: See Batchlet Monitoring Support.
Run batchlet --help to know more about the cli usage. Also see the Usage sections.
Usage
The batchlet run command accepts a JSON configuration with the following keys:
"command": The pipeline command to execute inside batchlet context"dask_params": Parameters to configure the dask cluster"monitor": Parameters to configure monitoring
The "dask_params" and "monitor" are dictionaries with specific keys.
For information about the available configurations of the dask cluster, please refer Batchlet Configuration Details.
Example configuration
{
"command": [
"command",
"args"
],
"dask_params": {
"nodes": 1,
"workers_per_node": 2,
"threads_per_worker": 20,
"memory_per_worker": "64G",
"resources_per_worker": "process=1",
"use_entry_node": true,
"dask_cli_option": "--dask-scheduler",
"dask_report_dir": "./dask-reports"
},
"generate_reports_on_failure": true,
"monitor": {
"resources": {
"level": 0,
"save_dir": "/path/to/monitor/output"
},
"logs": {
"filter_plugins": [
{
"name": "SKASDPFilter",
"kwargs": {
"pipeline": "E2E"
}
}
],
"consumer_plugins": [
{
"name": "CSVFile",
"kwargs": {
"file_path": "./events.csv"
}
},
{
"name": "SDPConfigurationDB",
"kwargs": {
"pb_id" : "pb-e2e-20250716-00001",
"kind": "data-product",
"flow_names": ["mswriter"]
}
}
]
}
}
}
The batchlet run command reads the JSON configuration either from
standard input
stdincat <<'EOF' | batchlet run - {"command": [], "dask_params": {}, "monitor": {}} EOF
JSON file
echo '{"command": [], "dask_params": {}, "monitor": {}}' > batchlet_config.json batchlet run batchlet_config.json