Workflow Development

The steps to develop and test an SDP workflow are as follows:

1. Create the workflow

  • Clone the ska-sdp-science-pipelines repository from GitLab and create a new branch for your work.

  • Create a directory for your workflow in src:

    $ mkdir src/<my_workflow>
    $ cd src/<my_workflow>
    
  • Write the workflow script (<my_workflow>.py). See the existing workflows for examples of how to do this. The examples Real-time workflow (Test Real-Time Workflow) and Batch workflow (Test Batch Workflow) are the best place to start. These are meant to give you a general idea of what structure batch and realtime workflows should have, and help develop your own.

    List of available Helm charts, which can be used for workflows, and their documentation can be found at: TBC

  • Create a requirements.txt file with your workflow’s Python requirements, e.g.

    --index-url https://artefact.skao.int/repository/pypi-all/simple
    ska-ser-logging
    ska-sdp-workflow
    
  • Create a Dockerfile for building the workflow image, e.g.

    FROM python:3.9-slim
    
    COPY requirements.txt ./
    RUN pip install -r requirements.txt
    
    WORKDIR /app
    COPY <my_workflow>.py ./
    ENTRYPOINT ["python", "<my_workflow>.py"]
    

    Use the base-image of your choice, preferably the latest numbered slim version of it, e.g. python:3.9-slim.

  • Create a file called version.txt containing the semantic version number of the workflow.

  • Create a Makefile containing

    NAME := ska-sdp-wflow-<my-workflow>
    VERSION := $(shell cat version.txt)
    
    include ../../make/Makefile
    

2. Test the workflow locally

  • Build the workflow image. If you are using minikube to deploy the SDP, run:

    $ eval $(minikube -p minikube docker-env)
    $ make build
    

    else, just run the make build command. This will add the image to your minikube or local Docker daemon where it can be used for testing with a local deployment of the SDP.

  • Deploy SDP locally and start a shell in the console pod.

  • Add the new workflow to the configuration DB. This will tell the SDP where to find the Docker image to run the workflow:

    ska-sdp create workflow <kind>:<name>:<version> '{"image": "<docker-image:version>"}'
    

    where the values are:

    • <kind>: batch or realtime, depending on the kind of workflow

    • <name>: name of your workflow

    • <version>: version of your workflow

    • <docker-image:version>: Docker image you just built from your workflow, including its version tag.

    If you have multiple workflows to add, you can import the definitions with:

    ska-sdp import some-workflows.json
    

    An example JSON file for importing workflows can be found at: Example Workflow JSON

  • To run the workflow, create a processing block, either via the Tango interface, or by creating it directly in the configuration DB with ska-sdp create pb.

3. Finish development and deploy the workflow

  • Once you are happy with the workflow, add it to the GitLab CI file (.gitlab-ci.yml) in the root of the repository. You need to add a build and publish job for it:

    build-<my_workflow>:
      extends: .docker_build_workflow
      before_script:
        - cd src/<my_workflow>>
      only:
        changes:
          - src/<my_workflow>/*
    
    publish-<my_workflow>:
      extends: .publish
      before_script:
        - cd src/<my_workflow>
      only:
        refs:
          - master
        changes:
          - src/<my_workflow>/*
    

    This will enable the Docker image to be built and pushed to the SKA artefact repository when it is merged into the master branch.

  • Add the workflow to the workflow definition file workflows.json in the root of the repository. By default the SDP uses this file to populate the workflow definitions in the configuration DB when it starts up.

  • Create a README.md and add the description and instructions to run your workflow. Include it in the documentation:

    • create a new file in docs/src/<my_workflow>.rst

    • add the following to it:

    .. mdinclude:: ../../src/<my_workflow>/README.md
    
    • update docs/src/index.rst

  • Commit the changes to your branch and push to GitLab.

Workflow Script Generator

To speed things up for the workflow developer, a python script has been developed to automatically generate the source files for a workflow.

The script will generate all the required source files (Python script, requirements, version number, Dockerfile, Makefile, README) for both real-time and batch workflow.

To run the script:

cd workflow_template
python create_workflow.py <kind> <name>

For example:

python create_workflow.py realtime test_realtime_workflow

Usage

Create generic source files for batch/real-time workflows.

Usage:
    create_workflow.py <kind> <name>
    create_workflow.py (-h|--help)

Arguments:
    <kind>    Type of workflow (realtime or batch)
    <name>    Name of the workflow to be created