Batchlet ======== The ``batchlet`` is a tool specifically made to run dask based batch processing pipelines. Batchlet acts as a wrapper tool over the pipeline process, and provides following abilities: 1. **Managing the dask cluster**: Used by the pipeline to perform dask-based computations. See :doc:`dask_cluster/index` section for more info. 2. **Monitoring resources and logs**: See :doc:`monitoring/index`. Run ``batchlet --help`` to know more about the cli usage. Also see the `Usage <#usage>`__ sections. Usage ----- The ``batchlet run`` command accepts a JSON configuration with the following keys: - ``"command"``: The pipeline command to execute inside batchlet context - ``"dask_params"``: Parameters to configure the dask cluster - ``"monitor"``: Parameters to configure monitoring The ``"dask_params"`` and ``"monitor"`` are dictionaries with specific keys. For information about the available configurations of the dask cluster, please refer :doc:`batchlet_configuration`. Example configuration ~~~~~~~~~~~~~~~~~~~~~ .. code:: json { "command": [ "command", "args" ], "dask_params": { "nodes": 1, "workers_per_node": 2, "threads_per_worker": 20, "memory_per_worker": "64G", "resources_per_worker": "process=1", "use_entry_node": true, "dask_cli_option": "--dask-scheduler", "dask_report_dir": "./dask-reports" }, "generate_reports_on_failure": true, "monitor": { "resources": { "level": 0, "save_dir": "/path/to/monitor/output" }, "logs": { "filter_plugins": [ { "name": "SKASDPFilter", "kwargs": { "pipeline": "E2E" } } ], "consumer_plugins": [ { "name": "CSVFile", "kwargs": { "file_path": "./events.csv" } }, { "name": "SDPConfigurationDB", "kwargs": { "pb_id" : "pb-e2e-20250716-00001", "kind": "data-product", "flow_names": ["mswriter"] } } ] } } } The ``batchlet run`` command reads the JSON configuration either from 1. standard input ``stdin`` .. code:: bash cat <<'EOF' | batchlet run - {"command": [], "dask_params": {}, "monitor": {}} EOF 2. JSON file .. code:: bash echo '{"command": [], "dask_params": {}, "monitor": {}}' > batchlet_config.json batchlet run batchlet_config.json