Functionality
The supported functionality of the scripting library is as follows.
Starting, monitoring and ending a script
At the start
Claim the processing block.
Get the parameters defined in the processing block. They should be checked against the parameter schema defined for the script.
Resource requests
Make requests for input and output buffer space. The script will calculate the resources it needs based on the parameters, then request them from the processing controller. This is currently a placeholder.
Declare script phases
Scripts will be divided into phases such as preparation, processing, and clean-up. In the current implementation, only one phase can be declared, which we refer to as the ‘work’ phase.
Execute the work phase
On entry to the work phase, it waits until the resources are available. Meanwhile it monitors the processing block to see it has been cancelled. For real-time scripts, it also checks if the execution block has been cancelled.
Deploys execution engines to execute a script/function.
Monitors the execution engines’ deployment states and processing block state. Waits until the execution is finished, failed or the processing block is cancelled.
Continuously updates the processing block state with the status of execution engine deployments. It provides aggregate information about these statuses to inform other components about the readiness of deployments.
At the end
Remove the execution engines to release the resources.
Update processing block state with information about the success or failure of the script.
Receive scripts
Get IP and MAC addresses for the receive processes.
Monitor receive processes. If any get restarted, then the addresses may need to be updated.
Write the addresses in the appropriate format into the processing block state.
Compatibility with the telescope model library
We keep the scripting library compatible with the latest version of the telescope model library. If you use a configuration string that is based on an older version of the telescope model, you may experience errors or unexpected behaviour.
Data flow entries
The ProcessingBlock object provides methods to create specific data flow entries in the Configuration Database. These cover objects that provide configuration for data generation and usage for different processes.
Currently, the following types can be created with the scripting library:
data_product
: data products, e.g. Measurement Setsdata_queue
: Kafka queue configurationqa_display
: QA Display configurationtango_attribute
: configuration with a tango attribute as a sink
Processing Block parameter validation
The ProcessingBlock class provides a validate_parameters
method. This method retrieves the processing block parameters from the Configuration DB
(which were saved as part of the request set up by the Processing Controller and derived
from the AssignResources configuration string) and validates that information
against a user-defined Pydantic model. Usage:
from ska_sdp_scripting.processing_block import ProcessingBlock, ParameterBaseModel
# example
class MyPydanticModel(ParameterBaseModel):
args: list
kwargs: dict
pb = ProcessingBlock()
# returns the Pydantic class with loaded parameters
parameters = pb.validate_parameters(model=MyPydanticModel)
It throws a Pydantic ValidationError if the validation fails, else logs the success and returns.
Monitoring
There are several avenues of monitoring in place which should help to pinpoint any errors, deployment failures, or bugs you may come across.
In any of the following scenarios, the ‘error_messages’ key in the relevant processing block’s state will be updated with information to help you debug:
any of the pods associated with the processing scripts and their deployments fail or fail to start.
an error occurs in a processing script.
an error occurs in an execution engine.
In every case except an error at the application level, the processing block state’s status will be set to FAILED. This, along with the presence of an error_messages value, will be picked up by the subarray and reported appropriately there.
Added monitoring capability for slum deployment:
Slurm deployments are only relevant for batch processing.
Monitoring for slurm deployments only happens for errors.
The state for slurm deployments contains the following keys: - num_job (equivalent to num_pod in Kubernetes) - jobs (equivalent to pods in Kubernetes) - More Details:
Currently, slurm deployments do not contain the
error_state
key. - Implemented in a way that allows easy expansion when error_state is eventually added. - Default handling ensures that missingerror_state
does not break existing functionality.