Batch Imaging Workflow
The batch_imaging
workflow is a proof-of-concept of integrate a scientific
workflow with the SDP prototype. It simulates visibilities and images them using
RASCIL with Dask as an execution engine.
The workflow simulates SKA1-Low visibility data in a range of hour angles from -30 to 30 degrees and adds phase errors. The visibilities are then calibrated and imaged using the ICAL pipeline.
The workflow creates buffer reservations for storing the visibilities and images.
Parameters
The workflow parameters are:
n_workers
: number of Dask workers to deployfreq_min
: minimum frequency (in hertz)freq_max
: maximum frequency (in hertz)nfreqwin
: number of frequency windowsntimes
: number of time samplesrmax
: maximum distance of stations to include from array centre (in metres)ra
: right ascension of the phase centre (in degrees)dec
: declination of the phase centre (in degrees)buffer_vis
: name of the buffer reservation to store visibilitiesbuffer_img
: name of the buffer reservation to store images
For example:
{
"n_workers": 4,
"freq_min": 0.9e8,
"freq_max": 1.1e8,
"nfreqwin": 8,
"ntimes": 5,
"rmax": 750.0,
"ra": 0.0,
"dec": -30.0,
"buffer_vis": "buff-pb-mvp01-20200523-00001-vis",
"buffer_img": "buff-pb-mvp01-20200523-00001-img"
}
Running the workflow using iTango
If using Minikube, make sure to increase the memory size (minimum 16 GB):
minikube start --memory=16g
Once the SDP is running, start an iTango shell.
First, obtain a handle to a subarray device and turn it on:
d = DeviceProxy('mid_sdp/elt/subarray_1')
d.On()
If you are not sure what devices are available, list them with lsdev
.
Create a configuration string for the scheduling block instance. This contains
one real-time processing block, which uses the test_realtime
workflow as a
placeholder, and one batch processing block containing the batch_imaging
workflow, which uses the example parameters from above:
config_sbi = '''
{
"id": "sbi-mvp01-20200523-00000",
"max_length": 21600.0,
"scan_types": [
{
"id": "science",
"channels": [
{"count": 8, "start": 0, "stride": 1, "freq_min": 0.9e8, "freq_max": 1.1e8, "link_map": [[0,0]]}
]
}
],
"processing_blocks": [
{
"id": "pb-mvp01-20200523-00000",
"workflow": {"type": "realtime", "id": "test_realtime", "version": "0.2.2"},
"parameters": {}
},
{
"id": "pb-mvp01-20200523-00001",
"workflow": {"type": "batch", "id": "batch_imaging", "version": "0.1.1"},
"parameters": {
"n_workers": 4,
"freq_min": 0.9e8,
"freq_max": 1.1e8,
"nfreqwin": 8,
"ntimes": 5,
"rmax": 750.0,
"ra": 0.0,
"dec": -30.0,
"buffer_vis": "buff-pb-mvp01-20200523-00001-vis",
"buffer_img": "buff-pb-mvp01-20200523-00001-img"
},
"dependencies": [
{"pb_id": "pb-mvp01-20200523-00000", "type": ["none"]}
]
}
]
}
'''
Note that each workflow may come with multiple versions. Always use the latest number, unless you know a specific version that suits your needs. (The Changelog at the end of this page may help to decide.)
The scheduling block instance is created by the AssignResources
command:
d.AssignResources(config_sbi)
In order for the batch processing to start, you need to end the real-time processing
with the ReleaseResources
command:
d.ReleaseResources()
You can watch the pods and persistent volume claims (for the buffer reservations) being deployed with the following command or using k9s:
kubectl -w get pod,pvc -n sdp
At this stage you should see a pod called
proc-pb-mvp01-20200523-00001-workflow-...
and the status is RUNNING
. To see
the logs, run:
kubectl logs <pod-name> -n sdp
and it should look like this:
INFO:batch_imaging:Claimed processing block pb-mvp01-20200523-00001
INFO:batch_imaging:Waiting for resources to be available
INFO:batch_imaging:Resources are available
INFO:batch_imaging:Creating buffer reservations
INFO:batch_imaging:Deploying Dask EE
INFO:batch_imaging:Running simulation pipeline
INFO:batch_imaging:Running ICAL pipeline
...
Accessing the data
The buffer reservations are realised as Kubernetes persistent volume claims. They should have persistent volumes created to satisfy them automatically. The name of the corresponding persistent volume is in the output of:
kubectl get pvc -n sdp
The location of the persistent volume in the filesystem is shown in the output of:
kubectl describe pv <pv-name>
If you are running Kubernetes with Minikube in a VM, you need to log in to it first to gain access to the files:
minikube ssh
Running the workflow using the SDP CLI
Deploy SDP and start the console as described at Running SDP stand-alone.
You may also run this workflow directly from the console using the ``ska-sdp` CLI <https://developer.skao.int/projects/ska-sdp-config/en/latest/cli.html>`__.
Run the workflow:
ska-sdp create pb batch:batch_imaging:0.1.1
If you want to change the default parameters, you can run instead as follows (update the JSON string as needed):
ska-sdp create pb batch:batch_imaging:0.1.1 '{"n_workers": 4, "freq_min": 0.9e8, "freq_max": 1.1e8}'
You can watch the pod being created as before either using
kubectl -w get pods -n sdp
or k9s. To access the data created by the workflow, follow the steps above at “Accessing the data” in the “Running the workflow using iTango” section.
Changelog
0.1.2
use latest SDP configuration library