Batchlet

The batchlet is a tool specifically made to run dask based batch processing pipelines.

Batchlet acts as a wrapper tool over the pipeline process, and provides following abilities:

  1. Managing the dask cluster: Used by the pipeline to perform dask-based computations. See Batchlet Managed Dask Clusters section for more info.

  2. Monitoring resources and logs: See Batchlet Monitoring Support.

Run batchlet --help to know more about the cli usage. Also see the Usage sections.

Usage

The batchlet run command accepts a JSON configuration with the following keys:

  • "command": The pipeline command to execute inside batchlet context

  • "dask_params": Parameters to configure the dask cluster

  • "monitor": Parameters to configure monitoring

The "dask_params" and "monitor" are dictionaries with specific keys. For information about the available configurations of the dask cluster, please refer Batchlet Configuration Details.

Example configuration

{
  "command": [
    "command",
    "args"
  ],
  "dask_params": {
    "nodes": 1,
    "workers_per_node": 2,
    "threads_per_worker": 20,
    "memory_per_worker": "64G",
    "resources_per_worker": "process=1",
    "use_entry_node": true,
    "dask_cli_option": "--dask-scheduler",
    "dask_report_dir": "./dask-reports"
  },
  "generate_reports_on_failure": true,
  "monitor": {
    "resources": {
      "level": 0,
      "save_dir": "/path/to/monitor/output"
    },
    "logs": {
      "filter_plugins": [
        {
          "name": "SKASDPFilter",
          "kwargs": {
            "pipeline": "E2E"
          }
        }
      ],
      "consumer_plugins": [
        {
          "name": "CSVFile",
          "kwargs": {
            "file_path": "./events.csv"
          }
        },
        {
          "name": "SDPConfigurationDB",
          "kwargs": {
            "pb_id" : "pb-e2e-20250716-00001",
            "kind": "data-product",
            "flow_names": ["mswriter"]
          }
        }
      ]
    }
  }
}

The batchlet run command reads the JSON configuration either from

  1. standard input stdin

    cat <<'EOF' | batchlet run -
    {"command": [], "dask_params": {}, "monitor": {}}
    EOF
    
  2. JSON file

    echo '{"command": [], "dask_params": {}, "monitor": {}}' > batchlet_config.json
    
    batchlet run batchlet_config.json