Test Mock Data Script

The test-mock-data script is designed to provide SDP data products in the form of data files on storage or data on Tango attributes of the QueueConnector device, without the need to execute vis-receive or any of the pipelines generating the data. The detailed design of the script can be found on this Confluence page.

It executes the different scenarios by deploying various execution engines that carry out the required processing.

It implements the following scenarios (list to grow as development progresses):

"pointing": Write pointing-offset results to HDF and/or the relevant Queue Connector tango attributes
"measurement-set": Copy user-provided MS file(s) to output directory
"data-stress-test": Generate data files of specified size and numbers with the dd command

A full description of the processing block parameters of this script can be found in the Processing block parameters section.

Pointing offset

There are three available options that can be chosen for this scenario:

"write-hdf": HDF files that follow a pointing data template are written to disk at the standard output directory. Corresponding metadata files are also added.
"send-to-kafka": Pointing offset results are sent to Kafka and the QueueConnector device is configured to read these data and display them on its dish-specific pointing attributes.
"both": Runs both write-hdf and send-to-kafka options.

The output pointing HDF files follow the data structure of the data product of the pointing offset calibration pipeline.

An internal CSV file and HDF template are used to obtain the pointing offsets. The CSV file allows for a set of fifty different sources. The pointing engine listens for any finished scans, and takes pointing offsets from the CSV file every time a completed 5-point observation has been observed. If more observations are completed than contained in the CSV file, it will loop back to start at the beginning again.

Note that the configuration information for each dish in the HDF output is identical. This is because the template HDF file contains only one dish, and the coordinates etc. are copied for all other dishes - i.e. all dishes are placed at the same physical coordinates.

Based on the selected option, the script also configures data flow objects to be created in the SDP configuration database, where a DataProduct flow is used for HDF files and DataQueue flow used to send the data via Kafka. The queue connector is also configured using a TangoAttributeMap flow object so that it picks up the offsets that were sent to Kafka.

Write Measurement Sets data

Measurement Set (MS) files for each scan are written to disk at the standard output directory as defined by ADR-55. Input MS file paths must be provided on the shared PVC for each scan and these are copied to the output. A corresponding metadata file is also generated.

The measurement set engine listens for any new scan, and when the scan starts, copies the next MS to the standard output directory. The output MS is renamed to match the observed scan id.

Write arbitrary data with `dd`

The scenario data-stress-test was set up to generate random data into to ADR-55 directory of the processing block. The number of data products, their sizes, and the time during which they’re generated for are all configurable parameters (see Processing block parameters).

The reason this scenario was implemented is to be used for testing the buffer management feature of the SDP. This allows putting a lot of strain on the system by generating a large number of files of various sizes, emulating any batch or realtime processing block data generation.

Testing

The script uses the following environment variables.

Environment variables used by the test-mock-data script.
Name	Description	Default
`SDP_CONFIG_HOST`	Host address of the Configuration DB	`127.0.0.1`
`SDP_CONFIG_PORT`	Port of the Configuration DB	`2379`
`SDP_KAFKA_HOST`	Kafka server (host)	`localhost:9092`
`SDP_DATA_PVC_NAME`	PVC name	`None`
`SDP_HELM_NAMESPACE`	K8s namespace used for data product flow	`None`
`WATCHER_TIMEOUT`	Timeout used when waiting for scans	60 s

Deploy SDP and make sure the iTango console pod is also running.

After entering the iTango pod, obtain a handle to a subarray device and turn it on:

d = DeviceProxy('test-sdp/subarray/01')
d.On()

If you are not sure what devices are available, list them with lsdev.

The customised processing block parameters that can be included in the configuration string can be found in the Processing block parameters section and summarised in the following table.

Processing block parameters
Name	Description	Default
`scenario`	Name of scenario to run	`None`
`pointing`	`pointing` scenario parameters	`{}`
`measurement_set`	`measurement-set` scenario parameters	`{}`
`data_stress_test`	`data-stress-test` scenario parameters	`{}`

scenario needs to be set to pointing, measurement-set, or data-stress-test` for the different scenarios to be run. If no scenario is specified, the script sets up the EB and PB correctly but does not deploy any execution engine.

An example for the measurement set scenario with 5 scans is:

"parameters": {
  "scenario": "measurement-set",
  "measurement_set": {
      "input_data": ["product/eb-orcatest-20240814-94773/ska-sdp/pb-orcatestvr-20240814-94773/output.scan-1.ms",
                     "product/eb-orcatest-20240814-94773/ska-sdp/pb-orcatestvr-20240814-94773/output.scan-2.ms",
                     "product/eb-orcatest-20240814-94773/ska-sdp/pb-orcatestvr-20240814-94773/output.scan-3.ms",
                     "product/eb-orcatest-20240814-94773/ska-sdp/pb-orcatestvr-20240814-94773/output.scan-4.ms",
                     "product/eb-orcatest-20240814-94773/ska-sdp/pb-orcatestvr-20240814-94773/output.scan-5.ms"]
  }
}

Start the execution block with the AssignResources command:

d.AssignResources(config)

config is a full AssignResources configuration string for SDP. See telescope model example.

The dishes that will appear in the output data for the pointing scenario are set from the receptors config parameter in execution_block.resources section of the configuration string.

The script will request the start of the engine pod called mock-data. This pod watches for the scans that are commanded on the subarray. The measurement-set scenario writes the relevant measurement set files when each scan is started, the pointing scenario waits for the end of the 5th scan in the pointing observation before writing files. The data_stress_test runs for each scan as it is executed, and stops writing data when an EndScan command is executed for the given scan_id.

If the offsets were sent to the Queue Connector, you can then access the data in itango3 by running the following code (replace the dish ID as required):

q = DeviceProxy("test-sdp/queueconnector/01")
q.pointing_offset_SKA001

To remove the deployment, you need to release all the resources and return the subarray to an EMPTY state (in itango):

d.end()
d.releaseAllResources()

Processing block parameters

pydantic settings TestMockDataParams

test-mock-data script parameters

Show JSON schema

{
   "title": "test-mock-data",
   "description": "test-mock-data script parameters",
   "type": "object",
   "properties": {
      "scenario": {
         "default": null,
         "description": "The name of the mock data scenario. Allowed: 'pointing' for sending pointing data to Kafka and/or writing pointing HDF files, 'measurement-set' for copying MS to the output directory, 'data-stress-test' for writing large random files to stress-test storage.",
         "enum": [
            "pointing",
            "measurement-set",
            "data-stress-test",
            null
         ],
         "title": "Scenario name"
      },
      "pointing": {
         "$ref": "#/$defs/PointingParams",
         "description": "Dictionary optional parameters to configure the 'pointing' scenario.",
         "title": "Parameters for the 'pointing' scenario."
      },
      "measurement_set": {
         "$ref": "#/$defs/MSCopyParams",
         "description": "Dictionary optional parameters to configure the 'measurement-set' scenario.",
         "title": "Parameters for the 'measurement-set' scenario."
      },
      "data_stress_test": {
         "$ref": "#/$defs/StressTestParams",
         "description": "Dictionary optional parameters to configure the 'data-stress-test' scenario.",
         "title": "Parameters for the 'data-stress-test' scenario."
      },
      "buffer_request_size": {
         "default": 10,
         "description": "Specifies a size of creating a buffer request.",
         "title": "Size of the buffer request to make in MiB",
         "type": "integer"
      },
      "data_product_expiry": {
         "default": 300,
         "description": "Allows customizing how quickly data product flows should expiry.If set to -1, won't expire.",
         "title": "Expiry time on completed data product flows in seconds.",
         "type": "integer"
      }
   },
   "$defs": {
      "MSCopyParams": {
         "description": "Parameters specific to the \"measurement-set\" scenario.",
         "properties": {
            "input_data": {
               "anyOf": [
                  {
                     "items": {},
                     "type": "array"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "List of Measurement Sets to copy, without mount path - e.g. ['path/scan-1.ms', 'path/scan-2.ms'].",
               "title": "List of Measurement Sets"
            }
         },
         "title": "MSCopyParams",
         "type": "object"
      },
      "PointingParams": {
         "description": "Parameters specific to the \"pointing\" scenario.",
         "properties": {
            "kafka_topic": {
               "default": "pointing_offset",
               "description": "Kafka topic name. If not supplied, the topic will be set to 'pointing_offset'.",
               "title": "Kafka topic",
               "type": "string"
            },
            "pointing_option": {
               "default": "both",
               "description": "Specifies the output of the 'pointing' scenario. Allowed: 'send-to-kafka' for sending pointing data to kafka, 'write-hdf' for writing pointing HDF files, 'both' for sending pointing data to kafka and writing pointing HDF files.",
               "enum": [
                  "write-hdf",
                  "send-to-kafka",
                  "both"
               ],
               "title": "Flag to specify options for pointing scenario",
               "type": "string"
            }
         },
         "title": "PointingParams",
         "type": "object"
      },
      "StressTestParams": {
         "description": "Parameters specific to the \"data-stress-test\" scenario.",
         "properties": {
            "file_size": {
               "default": 1000000,
               "description": "Size in bytes of each file written.",
               "title": "File size in bytes",
               "type": "integer"
            },
            "files_per_scan": {
               "default": 1,
               "description": "Number of files to write.",
               "title": "Number of files per scan",
               "type": "integer"
            },
            "total_scan_time": {
               "default": 0.0,
               "description": "Total time over which files are written per scan.",
               "title": "Total scan write time in seconds",
               "type": "number"
            },
            "start_delay": {
               "default": 0.0,
               "description": "Delay before the first file writen.",
               "title": "Start delay in seconds",
               "type": "number"
            },
            "number_of_chunks": {
               "default": 10,
               "description": "Number of chunks per file.",
               "title": "Number of chunks",
               "type": "integer"
            }
         },
         "title": "StressTestParams",
         "type": "object"
      }
   },
   "additionalProperties": false
}

Config:

strict: bool = True
extra: str = forbid
arbitrary_types_allowed: bool = False
validate_assignment: bool = True
title: str = test-mock-data

Fields:

buffer_request_size (int)
data_product_expiry (int)
data_stress_test (generate_mock_data.test_mock_data_params.StressTestParams)
measurement_set (generate_mock_data.test_mock_data_params.MSCopyParams)
pointing (generate_mock_data.test_mock_data_params.PointingParams)
scenario (Literal['pointing', 'measurement-set', 'data-stress-test', None])

field buffer_request_size: int = 10: Specifies a size of creating a buffer request.

field data_product_expiry: int = 300: Allows customizing how quickly data product flows should expiry.If set to -1, won’t expire.

field data_stress_test: StressTestParams [Optional]: Dictionary optional parameters to configure the ‘data-stress-test’ scenario.

field measurement_set: MSCopyParams [Optional]: Dictionary optional parameters to configure the ‘measurement-set’ scenario.

field pointing: PointingParams [Optional]: Dictionary optional parameters to configure the ‘pointing’ scenario.

field scenario: Literal['pointing', 'measurement-set', 'data-stress-test', None] = None: The name of the mock data scenario. Allowed: ‘pointing’ for sending pointing data to Kafka and/or writing pointing HDF files, ‘measurement-set’ for copying MS to the output directory, ‘data-stress-test’ for writing large random files to stress-test storage.

Changelog

development

[BREAKING] Processing block parameters were restructured based on scenarios. (MR440)
Updated all scenarios to not shut down until the script request the shutdown. (MR440)
“data-stress-test” scenario observes EndScan. (MR440)
Created 3rd scenario “data-stress-test”. For each scan create a number of data files of a set size over time. (MR438)

1.2.0

Make pointing scenario compatible with SDP 2.4.0. This means it is no longer compatible with former SDP versions. (MR428)
Update dependencies, including ska-sdp-scripting 3.0.1 (MR428)

1.1.0

Update engines to react to eb status and set flow status accordingly (MR355)
Update to use resource requests (MR343)
Update to use Flow entries for pointing and MS writer mocks (MR343)

1.0.0

Update dependencies, documentation and bug fixes (MR246)
Remove code to configure queue connector before v5 (MR240)
Add function to create data flows in pointing scenario (MR238)
Combine “kafka” and “hdf” scenarios into a single “pointing” scenario (MR235)
Remove using specific scan IDs (MR232)
Remove option that allows users to provide input data for the pointing-offset-hdf and pointing-offset-queue-connector scenarios (MR232)
Update Dockerfile to use SKA Python base image (MR211)
Update pointing-offset-queue-connector to wait until scans are run/completed (MR206)
Update behaviour when no input_data is specified to use internal data (MR206)

0.1.0

[!WARNING] This version only works with SDP 0.24.0 but if the QueueConnector device is set to be version 4.1.0

Add the pointing data CSV file to the docker image (MR202)
Update ska-sdp-scripting to 0.12.0 (MR202)
Add functionality to check which antennas to simulate and update all relevant parameters in the template HDF5 to match number of antennas (MR195)
Update file writing to occur when scans are run/completed (MR196)
Add functionality to the mock-test-data processing script to generate data files (MR187)
Processing script reports internal errors in pb state (MR185)
Pydantic model included in documentation (MR189)
JSON parameter schema added to tmdata (MR186)
Validate processing block parameters using scripting library 0.10.0 (MR180)
Added processing block parameter JSON schema and Pydantic model (MR180)
Update script to write basic metadata yaml file (MR181)
Update script to execute required functions to produce data in an execution engine (MR179)
Initial version of the script, which sends mock pointing offset data to Kafka and configures the QueueConnector to display this data in tango attributes. (MR175)