Actions, Assertions and Synchronisation ------------------------------------------- Conceptual definition ^^^^^^^^^^^^^^^^^^^^^^^^^^ The first logical building block to consider is the **Action**. An Action is a structured representation of an interaction with the SUT. Concretely, it can be any operation you may wish to perform on the SUT, such as sending a command, setting an attribute, or orchestrating something more complex. We assume that most interactions with the SUT, whether simple or complex, can be represented as a sequence of the following three +1 steps: 1. The **verification of certain pre-conditions**, which must be satisfied before the action can be executed (e.g., ensuring the SUT is in a specific known state). 2. The **execution of the action procedure** itself (e.g., sending a command, setting an attribute, etc.). 3. The **verification of certain post-conditions**, which are expected to be met following a successful action execution (e.g., confirming the SUT has reached a given target state). As we are working with distributed systems where interactions are highly event-based, post-conditions will typically need to be **verified within a timeout**. This is because the SUT may take time to reach the expected state, and waiting indefinitely is not practical. In this sense, post-condition verification is a form of **synchronisation**. Additionally, since we are dealing with an event-based system, an action may require a **setup** phase to prepare for execution and condition verification. This setup phase may involve subscribing to certain events or clearing existing events to prevent false positives or negatives in verifications, thereby ensuring that the action can be executed multiple times. Design and implementation ^^^^^^^^^^^^^^^^^^^^^^^^^^ In ITH as a Platform, we provide a base class for actions (:py:class:`~ska_integration_test_harness.core.actions.SUTAction`) that implements the fundamental structure of an action, including setup, pre-conditions, post-conditions, and timeout handling. This base class is designed to be extended by custom actions, allowing users to implement specific interaction logic for their SUT. Since some requirements may be common across teams, we also provide ready-to-use actions in both the :py:mod:`~ska_integration_test_harness.core` layer and the :py:mod:`~ska_integration_test_harness.extensions` layer. For example, the :py:class:`~ska_integration_test_harness.extensions.lrc.TangoLRCAction` is a pre-built action that sends a Tango Long Running Command to a device, waits for its completion, and synchronises with certain device state changes. From your test code (whether in your customisation of the Test Harness or within your tests), you can utilise these ready-to-use actions, or you can create your own custom actions by extending the base class and implementing the necessary extension points. |ith-platform-actions| Here follow a few notes on the key components of the action mechanism. - :py:class:`~ska_integration_test_harness.core.actions.SUTAction` is a base class for executing operations on a SUT, supporting preconditions, postconditions (e.g., synchronisation), setup, and a defined procedure. - ``execute()`` orchestrates the execution of specific methods. It also handles logging, timeouts, and other utilities. - Subclasses must implement ``execute_procedure()``; other methods are optional and by default do nothing. You can create custom actions by extending :py:class:`~ska_integration_test_harness.core.actions.SUTAction` or any of its subclasses. - :py:class:`~ska_integration_test_harness.extensions.lrc.TangoLRCAction` is a ready-to-use action for sending Tango Long Running Commands (LRCs) to devices, handling synchronisation, and monitoring device state changes. - It resides in the Common Extensions package, as it incorporates SKA-specific knowledge of how Long Running Commands emit events. - :py:class:`~ska_integration_test_harness.extensions.lrc.TangoLRCAction` sends a Tango command to a device, waits for LRC completion, and monitors device state changes and LRC errors. It achieves this by extending certain :py:class:`~ska_integration_test_harness.core.actions.SUTAction` descendants and using structured representations of expected preconditions and postconditions, which are then synchronised with the tracer. More details on this mechanism can be found in the following example and in the API documentation. The core logic of actions is implemented in the following modules: - :py:mod:`ska_integration_test_harness.core.actions` - :py:mod:`ska_integration_test_harness.core.assertions` Usage Example 1 (simple): Command + LRC & State Synchronisation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In this first simple example, we demonstrate how to use the action mechanism with the provided blocks to execute a basic **Tango command call**, the subsequent **LRC completion check**, and the **state synchronisation**. Assume we have a Tango device and want to send it a command. Also, assume that the command 1) is a Long Running Command (LRC) and 2) will cause the state of some other devices to change to a particular state. Suppose we want to ensure that the command executes correctly (without errors) and that the desired states are reached. To achieve this, we proceed as follows: 1. Define the command as an instance of :py:class:`ska_integration_test_harness.extensions.lrc.TangoLRCAction`. 2. Define a pre-condition using an instance of :py:class:`ska_integration_test_harness.core.assertions.AssertDevicesAreInState` to verify the initial state of the devices and ensure the action executes from a valid initial state. 3. Define the expected state transitions as post-conditions using instances of :py:class:`ska_integration_test_harness.core.assertions.AssertDevicesStateChanges`. 4. Add directives to impose a timeout, synchronise LRC completion, and fail early if an LRC error is detected. 5. Finally, execute the enriched action object with all the directives applied. .. code-block:: python import tango import json from ska_integration_test_harness.extensions.lrc import TangoLRCAction from ska_integration_test_harness.core.assertions import ( AssertDevicesAreInState, AssertDevicesStateChanges, ) from <...> import ObsState # The device where the command will be sent target_device = tango.DeviceProxy("tmc-low/centralnode/0") # The devices expected to change state as a result of the command subarray_devices = [ tango.DeviceProxy("tmc-low/subarray/01"), tango.DeviceProxy("csp-low/subarray/01"), tango.DeviceProxy("sdp-low/subarray/01"), tango.DeviceProxy("mccs/subarray/01"), ] # 1. Create an instance of an action that sends a command to a device action = TangoLRCAction( target_device=target_device, command_name="AssignResources", command_param=json.read("low/input/assign_resources.json"), ) # 2. Use pre-conditions to specify the expected initial state # for the action to execute successfully. This is optional and # often unnecessary, except for ensuring "stronger" tests. action.add_preconditions( # Expect the devices to be in the EMPTY state AssertDevicesAreInState( devices=subarray_devices, attribute_name="obsState", attribute_value=ObsState.EMPTY, ), ) # 3. Use post-conditions to specify the expected state changes # after the action executes. action.add_postconditions( # Expect a state change in the devices to the RESOURCING state AssertDevicesStateChanges( devices=subarray_devices, attribute_name="obsState", attribute_value=ObsState.RESOURCING, ), # Expect a state change in the devices to the IDLE state AssertDevicesStateChanges( devices=subarray_devices, attribute_name="obsState", attribute_value=ObsState.IDLE, previous_value=ObsState.RESOURCING, ), ) # 4. Add directives to synchronise LRC completion, fail early on LRC # errors, and set a timeout for the action. action.add_lrc_completion_to_postconditions() action.add_lrc_errors_to_early_stop() # 5. Execute the action (within a timeout) action.execute(postconditions_timeout=30) Or, more generically, you can build yourself a untility function like this one to build commands with the desired pre- and post-conditions and directives for whatever subarray command: .. code-block:: python from ska_tango_testing.integration.assertions import ChainedAssertionsTimeout # (other imports) subarray_devices = [ tango.DeviceProxy("tmc-low/subarray/01"), tango.DeviceProxy("csp-low/subarray/01"), tango.DeviceProxy("sdp-low/subarray/01"), tango.DeviceProxy("mccs/subarray/01"), ] # NOTE: this is a simplification for the sake of the example. commands_target = subarray_devices[0] def build_subarray_command_action( command_name: str, command_param: Any, verify_initial_state: ObsState | None = None, wait_states: list[ObsState] | None = None, wait_lrc_completion: bool = True, fail_on_lrc_errors: bool = True, ) -> TangoLRCAction: """ Build a TangoLRCAction for a subarray command. :param command_name: The name of the command to send. :param command_param: The parameter to send with the command. :param verify_initial_state: If specified, a pre-condition will be added to verify that the subarray devices are in this state before executing the command. :param wait_states: If specified, post-conditions will be added to verify that the subarray devices change to these states after executing the command. The order of states in the list will be used to verify the sequence of state changes. :param wait_lrc_completion: If True, a post-condition will be added to wait for the LRC completion after executing the command. :param fail_on_lrc_errors: If True, an early stop condition will be added to fail the action immediately if an LRC error is detected. : return: A TangoLRCAction instance with the specified command, pre-conditions, post-conditions, and directives. """ action = TangoLRCAction( target_device=commands_target, command_name=command_name, command_param=command_param, ) if verify_initial_state is not None: action.add_preconditions( AssertDevicesAreInState( devices=subarray_devices, attribute_name="obsState", attribute_value=verify_initial_state, ), ) for state in wait_states or []: AssertDevicesStateChanges( devices=subarray_devices, attribute_name="obsState", attribute_value=state, ), if wait_lrc_completion: action.add_lrc_completion_to_postconditions() if fail_on_lrc_errors: action.add_lrc_errors_to_early_stop() return action # Example usage assign_resources_action = build_subarray_command_action( command_name="AssignResources", command_param=json.read("low/input/assign_resources.json"), verify_initial_state=ObsState.EMPTY, wait_states=[ObsState.RESOURCING, ObsState.IDLE], wait_lrc_completion=True, fail_on_lrc_errors=True, ) configure_action = build_subarray_command_action( command_name="Configure", command_param=json.read("low/input/configure.json"), verify_initial_state=ObsState.IDLE, wait_states=[ObsState.CONFIGURING, ObsState.READY], wait_lrc_completion=True, fail_on_lrc_errors=True, ) scan_action = build_subarray_command_action( command_name="Scan", command_param=json.read("low/input/scan.json"), verify_initial_state=ObsState.READY, wait_states=[ObsState.SCANNING], # Assume we don't want to wait for LRC completion here wait_lrc_completion=False, fail_on_lrc_errors=True, ) abort_action = build_subarray_command_action( command_name="Abort", command_param=json.read("low/input/abort.json"), # Assume we don't care about the initial state for this one wait_states=[ObsState.ABORTING, ObsState.ABORTED], wait_lrc_completion=True, fail_on_lrc_errors=True, ) # run the actions, within the same timeout shared_timeout = ChainedAssertionsTimeout(100) assign_resources_action.execute(postconditions_timeout=shared_timeout) configure_action.execute(postconditions_timeout=shared_timeout) scan_action.execute(postconditions_timeout=shared_timeout) abort_action.execute(postconditions_timeout=shared_timeout) Some further comments on this code: - The pre-conditions are verified before the command is called. If they fail, an ``AssertionError`` is raised, and the command will not be called. - The post-conditions are verified after the command is called. They are verified in the order they are added, and if one fails, subsequent ones are not checked. Verification is performed using a :py:class:`~ska_tango_testing.integration.TangoEventTracer` to subscribe to events and check state changes through assertions. - The timeout specifies the maximum wait time for post-conditions to be verified. It does not affect pre-conditions or the command call. - The LRC completion check is a post-condition. It is verified after the command is called and after other post-conditions are checked, all within the same timeout. You can specify which result codes count as successful completions. Verification subscribes to the ``longRunningCommandResult`` state change event and checks the result code for the stored LRC ID. - The LRC error acts as a "sentinel," monitoring events and halting post-condition verification early if an error is detected. You can specify which result codes are treated as errors. If an error is detected, an ``AssertionError`` is raised, stopping verification before the timeout. - Synchronisation is managed internally by the :py:class:`ska_tango_testing.integration.TangoEventTracer`. All subscriptions and event resets are handled automatically, including storing the LRC ID. - Provided the pre-conditions are satisfied, an action can be executed multiple times. Post-condition tracking and timeouts are reset with each execution. In summary, the possible outcomes of an action execution are as follows: 1. Pre-conditions and post-conditions (including LRC completion) are satisfied: The action is successful. 2. A pre-condition fails: The action procedure (e.g., the command call) is not executed, and an ``AssertionError`` is raised. 3. Pre-conditions are satisfied, but some post-condition (including LRC completion) fails: The timeout expires, and an ``AssertionError`` is raised. 4. Pre-conditions are satisfied, but an LRC error is detected: An ``AssertionError`` is raised before the timeout or all post-conditions are verified. 5. Pre-conditions are satisfied, but the action procedure encounters an error (e.g., a command call error): The error is uncaptured, and the action fails like normal Python code. **Would you like to try this approach?** Here are some suggestions for further reading: - :py:class:`~ska_integration_test_harness.extensions.lrc.TangoLRCAction` for details on the action API - :py:mod:`~ska_integration_test_harness.core.assertions` for information on defining pre- and post-conditions, including how to create new ones Usage Example 2 (intermediate): Custom action ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Not all actions are simple command calls, and not all action synchronisation logic is standard. In this second example, we demonstrate how to create a custom action that operates on a Tango device attribute to configure a set of devices to be reachable (and waits for them to become so). Let us assume we have a controller device that needs to be activated to make itself and other devices reachable. The controller device has an attribute ``adminMode`` that can be set to ``ONLINE`` to activate the devices. Assume that to detect the reachability of these devices, we can subscribe to the ``telescopeState`` event and consider the devices reachable when they are in any of the following states: ``ON``, ``OFF``, or ``STAND_BY``. However, the subscription must occur **after** the controller device is activated (otherwise it will not work). Finally, let us say this is a setup procedure that is prone to failure, so we want to retry it up to 3 times with exponential timeouts. To achieve this, we proceed as follows: 1. Define a custom action by subclassing the base class :py:class:`ska_integration_test_harness.core.actions.SUTAction`, which is essentially an empty shell. 2. Override the ``execute_procedure`` method to implement the custom activation logic (in this case, setting the ``adminMode`` attribute). 3. Override the ``verify_postconditions`` method to implement the custom synchronisation logic (in this case, subscribing to the event and waiting for the devices to be reachable). Also, override the ``setup`` method to clean up the event tracer and allow multiple runs. 4. Provide a semantic description of the action (used in failure messages). 5. Create an action instance and run it within a retry loop. .. code-block:: python import tango from ska_integration_test_harness.core.actions import SUTAction from ska_tango_testing.integration import TangoEventTracer from <...> import AdminMode # Step 1: Subclass the base class SUTAction to create a custom action # from scratch. class ActivateSubsystem(SUTAction): """Activate a subsystem and ensure it is reachable.""" def __init__( self, controller_device: tango.DeviceProxy, other_devices: list[tango.DeviceProxy], **kwargs ): """Initialise the action. :param controller_device: The device that must be activated. :param other_devices: The devices that must be reachable. :param kwargs: Additional parameters. See the base class :py:class:`ska_integration_test_harness.core.actions.SUTAction` for further details. """ # Always call the super method and pass kwargs. This ensures # compatibility with the base class and its required parameters. super().__init__(**kwargs) self.controller_device = controller_device self.other_devices = other_devices self.tracer = TangoEventTracer() # (Pre-conditions are unnecessary here and can be skipped.) # --------------------------------------------------------------------- # Step 2: Implement the custom activation logic def execute_procedure(self): self.controller_device.adminMode = AdminMode.ONLINE # --------------------------------------------------------------------- # Step 3: Implement the custom synchronisation logic (and clean up) def verify_postconditions(self, timeout=0): # (Always good practice to call the super method) super().verify_postconditions() # Subscribe to the telescopeState event (deferred; usually # done in the setup method) self.tracer.subscribe_event(self.controller_device, "telescopeState") for device in self.other_devices: self.tracer.subscribe_event(device, "telescopeState") # Wait for the devices to be reachable assertpy_context = assert_that(tracer).described_as( self.description() + " The controller device must be reachable." ).within_timeout(timeout).has_change_event_occurred( self.controller_device, "telescopeState", # Define reachability based on these states custom_matcher=lambda event: event.attribute_value in [ tango.DevState.ON, tango.DevState.OFF, tango.DevState.STAND_BY, ] ) for device in self.other_devices: assertpy_context.described_as( self.description() + f" Device {device.dev_name()} must be reachable." ).has_change_event_occurred( device, "telescopeState", tango.DevState.ON ) # Verify all devices are now in the ONLINE admin mode for device in self.other_devices + [self.controller_device]: assert_that(device.adminMode).described_as( self.description() + f" {device.dev_name()}.adminMode must be ONLINE." ).is_equal_to(AdminMode.ONLINE) def setup(self): # (Always good practice to call the super method) super().setup() # Clean up the tracer self.tracer.unsubscribe_all() self.tracer.clear_events() # --------------------------------------------------------------------- # Step 4: Provide a semantic description of the action def description(self): return ( f"Activate the subsystem {self.controller_device.name} and " f"ensure the devices {', '.join(d.name for d in self.other_devices)} " f"are reachable." ) # --------------------------------------------------------------------- # Step 5: Create an action instance and retry it up to 3 times # with exponential timeouts action = ActivateSubsystem( controller_device=tango.DeviceProxy("csp-low/centralnode/01"), other_devices=[ tango.DeviceProxy("csp-low/subarray/01"), tango.DeviceProxy("csp-low/subarray/02"), ], ) errors = [] timeout = 10 for i in range(3): try: action.execute(timeout) break except AssertionError as e: logger.warning(f"Attempt {i+1} failed: {e}") errors.append(e) timeout *= 2 # Exponential backoff else: raise AssertionError( "The action failed after 3 attempts. Errors:\n" + "\n".join(errors) ) from e[-1] Some further comments on this code: - The base class for actions is an empty shell, but it provides the fundamental structure for action execution, which follows this sequence when the ``execute`` method is called: 1. The action is set up (via the ``setup`` method). 2. Pre-conditions are verified (via the ``verify_preconditions`` method). 3. The custom procedure is executed (via the ``execute_procedure`` method). 4. Post-conditions are verified (via the ``verify_postconditions`` method) within the specified timeout. - The ``setup`` method is always the first step in action execution, making it an excellent place to clean up resources and enable multiple runs. - The ``execute_procedure`` method is mandatory and serves as the location for implementing the custom logic of the action. - The ``verify_preconditions`` and ``verify_postconditions`` methods are optional but are useful for ensuring that the action starts from a valid state and achieves the expected results. - The ``description`` method provides a semantic description of the action and is used to generate meaningful error messages when the action fails. - The retry loop is a simple way to retry the action up to three times. **Would you like to try this approach?** Here are some suggestions for further reading: - :py:class:`~ska_integration_test_harness.core.actions` to learn more about the concept of actions. - :py:class:`~ska_integration_test_harness.core.actions.SUTAction` to learn more about the base class for creating custom actions. - `TangoEventTracer Getting Started Guide `_ to learn more about the event tracer, subscription mechanisms, and event assertion mechanisms. .. |ith-platform-actions| image:: ../uml-docs/ith-platform-actions.png