Dask
Dask allows distributed computation in Python.
Chart Details
This chart will deploy the following in Kubernetes:
1 x Dask scheduler with port 8786 (scheduler) and 8787 (Web UI) exposed on a
ClusterIP
(default)2 x Dask workers that connect to the scheduler
1 x Jupyter lab notebook (optional, false by default) with port 8888 exposed on a
ClusterIP
(default)1 x Pipeline Job (optional, false by default) which can run a SDP pipeline
Note: only version 0.2.0 of the chart contains the Jupyter lab notebook pod.
Note: only version 0.3.0 of the chart contains Pipeline job. The pipeline job is not part of the original chart and was added to run SDP pipelines that can run with dask.
Tip: See the Kubernetes Service Type Docs for the differences between
ClusterIP
,NodePort
, andLoadBalancer
.
Installing the Chart
First we need to add the helmdeployer-charts repo to our local helm configuration.
helm repo add ska-sdp-helm https://artefact.skao.int/repository/helm-internal
helm repo update
To install the dask chart with the release name test
:
helm install test ska-sdp-helm/ska-sdp-helmdeploy-dask
Depending on how your cluster was set up, you may also need to specify
a namespace with the following flag: --namespace my-namespace
.
Default Configuration
The following tables list the configurable parameters of the Dask chart and
their default values. Note that the container images are not provided by
default, you have to specify them in a custom values.yaml
file or as a
command line argument.
Dask scheduler
Parameter |
Description |
Default |
---|---|---|
|
Dask scheduler name |
|
|
Container image name |
|
|
Container image tag |
|
|
k8s deployment replicas |
|
|
Tolerations |
|
|
nodeSelector |
|
|
Container affinity |
|
Dask webUI
Parameter |
Description |
Default |
---|---|---|
|
Dask webui name |
|
|
k8s service port |
|
|
Enable ingress controller resource |
false |
|
Ingress resource hostnames |
dask-ui.example.com |
|
Ingress TLS configuration |
false |
|
Ingress TLS secret name |
|
|
Ingress annotations configuration |
null |
Dask worker
Parameter |
Description |
Default |
---|---|---|
|
Dask worker name |
|
|
Container image name |
|
|
Container image tag |
|
|
k8s hpa and deployment replicas |
|
|
Container resources |
|
|
Tolerations |
|
|
nodeSelector |
|
|
Container affinity |
|
|
Worker port (defaults to random) |
|
Jupyter
Parameter |
Description |
Default |
---|---|---|
|
Jupyter name |
|
|
Include optional Jupyter server |
|
|
Container image name |
|
|
Container image tag |
|
|
k8s deployment replicas |
|
|
k8s service port |
|
|
Container resources |
|
Pipeline Job
Parameter |
Description |
Default |
---|---|---|
|
Run pipeline post deployment of cluster |
|
|
Name of pipeline |
|
|
Command to invoke |
|
|
Arguments to the pipeline |
|
|
Name of cli argument to pass scheduler IP |
|
Running an SDP pipeline
An SDP pipeline can be run as a k8s job in this helm chart. The job is started after both the dask scheduler and dask workers are created.
To avoid pickling overheads, the functions, scheduler, worker and job are all in the same image. This is necessary, especially if there are any calls to other non python libraries.
Note: The IP of the dask scheduler is added to the pipeline’s args
as
--dask-scheduler=<scheduler-service-ip>
, this can be overridden with the
pipeline.schedulerOptionName
key.
Note: The job uses the volume information from the worker.volume.name
and
worker.volume.path
keys. These are required for data access when running a pipeline.
It is mandatory to pass in image
name
,command
,args
and the folder to be mounted along with a PVC. These information would typically be populated upstream by SDP system.--- # pipeline_values.yaml image: artefact.skao.int/ska-sdp-spectral-line-imaging:0.3.0 worker: volume: name: pvc-mnt-data path: /mnt/data pipeline: enabled: true name: my-sdp-pipeline command: my-sdp-pipeline-cli schedulerOptionName: --dask-scheduler args: - run - --input - /mnt/data/input.ms
Install the chart by running
helm install -n <namespace> pipeline-test charts/dask --values pipeline_values.yaml
Custom Configuration
If you want to change the default parameters, you can do this in two ways.
YAML Config Files
You can update the default parameters in values.yaml
by creating your own
custom YAML config file with the updated parameters, and specifying this file
via the -f
flag when installing your chart. For example:
helm install test ska-sdp-helm/ska-sdp-helmdeploy-dask -f values.yaml
Command-Line Arguments
If you want to change the parameters for a specific install without changing
values.yaml
, you can use the --set key=value[,key=value]
flag when running
helm install
, and it will override any default values. For example:
helm install test ska-sdp-helm/ska-sdp-helmdeploy-dask --set jupyter.enabled=false