SDP Processing Controller

The processing controller (PC) is the SDP service responsible for controlling the execution of processing blocks (PBs).

Each execution block (EB) that SDP is configured to execute contains a number of PBs, either real-time or batch. The real-time PBs run simultaneously for the duration of the EB and batch PBs run after the EB is finished. Processing blocks may have dependencies on flows, which represent resources such as data products, storage or processing nodes that must be available before the PB can execute.

The SDP architecture requires the PC to use a model of the available resources to determine if a PB can be executed. Resource allocation now includes dependency checking - batch processing blocks only receive resource allocations when their dependencies are satisfied, and are only executed when their dependencies are finished. Realtime processing blocks bypass dependency checking for both resource allocation and execution.

Processing block and its state

A PB and its state are located at the following paths in the configuration database:

/pb/[pb_id]
/pb/[pb_id]/state

The PB is created by the subarray Tango device when starting an EB. Once it is created it does not change. The state is created by the PC when deploying the processing script, and it is subsequently updated by the PC and the script.

The entries in the PB state relevant to the PC are status and resources_available, for example:

{
    "status": "WAITING",
    "resources_available": false
}

status is a string indicating the status of the script. Possible values are:

STARTING: set by the PC when it deploys the script, hereafter the script is responsible for setting status

WAITING: script has started, but is waiting for resources to be available to execute its processing

RUNNING: script is executing its processing

FINISHED: script has finished its processing

CANCELLING: script has been cancelled and its deployments are being removed

CANCELLED: script cancellation has finished

FAILED: set by the PC if it fails to deploy the script, or by the script in the case of a non-recoverable error

More information on processing block state can be found on Confluence: https://confluence.skatelescope.org/display/SE/State+Diagram+for+Processing+Blocks+and+Data+flows

resources_available is a boolean set by the PC to inform the script whether it has the resources available to start its processing. This is used to control when PBs with dependencies start and ensures that resources are only allocated when dependencies are satisfied.

Dependencies

The PC supports two types of dependencies:

Flow dependencies (recommended): Dependencies on flows that need to be in COMPLETED or INCOMPLETE state before the PB can execute.
Processing block dependencies (deprecated): Dependencies on the completion of processing blocks themselves.

Dependencies are defined in the processing block configuration and are checked before:

Setting resources_available to true
Creating resource allocations (when resource management is enabled)

This ensures that processing blocks only consume resources when the required flows are available, leading to more efficient resource utilization across the system.

Behaviour

The behaviour of the PC is summarised as follows:

If a PB is new, the PC will create the processing script deployment for it. A PB is deemed to be new if the PB state does not exist. The PC reads the script definition from the configuration DB to discover which OCI container image to deploy. It creates the state and sets status to STARTING and resources_available to false. If the script definition is not found in the configuration DB, the PC still creates the state, but sets status to FAILED.
The PC carries on monitoring the PB deployment and if that fails, updates the PB state with the error message and sets its status to FAILED
If a PB’s flow dependencies are all COMPLETED or INCOMPLETE, and any deprecated PB dependencies are FINISHED, the PC sets resources_available to true to allow it to start executing. Additionally, when resource management is enabled, the PC will only create resource allocations for processing blocks whose dependencies have been satisfied.
The PC updates the quantity for capcaity-buffer-storage-type allocations based on the “final_data_size” entry in completed data-product flow states.
Cleaning up the Configuration Database:
- The PC deletes PBs that are FINISHED and do not have any associated execution blocks. If there is an associated EB, it will check if that exists, and if it doesn’t, it deletes the PB.
- The PC deletes EBs without any associated PBs that are not still assigned to a subarray. If there are associated PBs, it checks if they exist, and if they don’t, it deletes the EB.
- The PC iterates through EBs with PBs. If the EB status is FINISHED, and all of its associated PBs are FINISHED it will delete both, if the following applies to every associated PB:
  - If PB has flows, all of the flows are in DELETED state
  - PB has been in the database in a finished state for a given time (by default one hour).
  If these conditions apply, the PC will delete all of the PBs and the EB. If there is at least one PB that cannot be deleted, the PC will not delete any of the other PBs or the EB, and will wait until all can be deleted together.
- The PC deletes processing deployments (scripts and execution engines) not associated with any existing PB.
- The PC deletes data flow entries not associated with any existing PB.
- The PC deletes dependency entries not associated with any existing data flows.
- The PC deletes requests and non-capacity-buffer-storage-type allocations for FINISHED or FAILED PBs, and removes the the request-link from capacity-buffer-storage-type allocations.

Resource Management

When RESOURCE_MANAGEMENT_TOGGLE is set to True, the processing controller will handle processing block resource requests.

To enable the feature flag, set global.sdp.feature_flags.resource_manager=true in the SDP chart.

The flow is as follows:

Processing script creates a /request in the configuration database which states how much space it needs.
The Buffer Manager updates the /resource entries in the configuration database with the amount of free, usable space. This quantity of usable space is updated every 60 seconds and passed onto the processing controller.
The Processing Controller runs a loop which: - Reads requests per processing block. - Checks if the processing block’s dependencies are satisfied before allocation. - For processing blocks with satisfied dependencies, checks that each request can be satisfied by the available resources. - Writes an allocation for that request only when dependencies are met and resources are available. - Tracks the allocated amount locally so that resources are not allocated twice.

This dependency-aware allocation prevents processing blocks from “hogging” resources before they can actually execute, ensuring more efficient resource utilization.

Implementation

The above explained behaviour of the PC is implemented using the Configuration Library’s Config().watcher() method. For more information on watchers take a look at the Watchers section of the Configuration Library documentation.