CSP.LMC scan consistency policy ================================ Purpose ------- The CSP.LMC scan consistency policy defines the CSP-specific rules used to validate subsystem behaviour while a subarray is scanning. Its purpose is to detect subsystem states that are unsafe, unexpected, or inconsistent with the current observation, and to determine whether scanning can continue, should continue in a degraded condition, or must transition to ``FAULT``. The policy can be configured to: - determine whether non-recoverable inconsistencies should trigger a hard fault; - specify which subsystems are required for a scan to be considered valid. Behaviour --------- The policy is evaluated while the aggregated subarray ``ObsState`` is ``SCANNING``. It is also evaluated when the aggregated state unexpectedly collapses from ``SCANNING`` to ``EMPTY`` or ``IDLE``. This allows the policy to classify restart-like conditions even when the aggregation has already dropped out of ``SCANNING``. During evaluation, the policy: - refines the set of required subsystems according to the active observing modes; - filters out subsystem components that are not relevant to the current subarray scan; - classifies subsystems whose ``ObsState`` is not ``SCANNING``; - produces a final decision containing: - the final subarray ``ObsState`` to apply, - whether the inconsistency is a hard fault, - a diagnostic message, - a severity level. Required subsystems ------------------- The default required subsystem set is: - ``cbf`` - ``pss`` - ``pst`` This set is refined dynamically from the active observing modes: - if ``PULSAR_TIMING`` is not active, ``pst`` is not required; - if neither ``PULSAR_SEARCH`` nor ``TRANSIENT_SEARCH`` is active, ``pss`` is not required. This ensures that only subsystems relevant to the current observation are validated by the scan consistency policy. Subsystem filtering ------------------- Before validation, the policy filters out subsystem components that are not assigned to the current subarray scan. In particular, PST beams with ``subarray_id == 0`` are ignored, because they are not part of the active scan and must not influence the subarray consistency decision. Classification of inconsistencies --------------------------------- Each required subsystem whose ``ObsState`` differs from ``SCANNING`` is classified into a structured inconsistency containing: - the subsystem FQDN, - the observed ``ObsState``, - a diagnostic code, - a human-readable description, - a severity. Examples of classified conditions include: - subsystem timing mismatches; - unexpected restarts; - subsystem faults; - generic state mismatches. Severity levels --------------- The policy classifies inconsistencies using three severity levels: - ``LOW``: the scan can continue and the inconsistency is considered recoverable or transient; - ``MEDIUM``: the scan can continue, but in degraded conditions; - ``HIGH``: the inconsistency is considered non-recoverable and forces a transition to ``FAULT``. The final decision is derived from the highest severity found across all invalid subsystems. PST handling ------------ PST inconsistencies are handled separately from other subsystems because their impact depends on the active observing modes. PULSAR_TIMING-only scans ~~~~~~~~~~~~~~~~~~~~~~~~ When the active observing mode set is exactly ``{PULSAR_TIMING}``, PST beams are treated as fault-critical for the scan. The policy computes a PST failure threshold from the number of PST beams participating in the scan: - if only one PST beam is present, the threshold is ``0`` and failure of that beam is sufficient to trigger ``FAULT``; - otherwise, the threshold is ``pst_count // 2``. A ``FAULT`` decision is generated only when the number of failing PST beams is strictly greater than the threshold. If the threshold is not exceeded, the scan can continue and the individual PST beam inconsistencies are reported with their own severity. Commensal observing modes ~~~~~~~~~~~~~~~~~~~~~~~~~ When ``PULSAR_TIMING`` is active together with one or more additional observing modes, PST inconsistencies do not force the subarray ``ObsState`` to ``FAULT`` on their own. In this case, PST failures are treated as degraded commensal scan conditions: - PST beam inconsistencies are downgraded to ``MEDIUM`` severity; - a warning is logged explaining that scanning can continue because the observation is not ``PULSAR_TIMING``-only; - the scan continues for the non-PST observing modes that remain active. Diagnostic reporting -------------------- When inconsistencies are detected, the policy builds a human-readable diagnostic message containing: - the active observing modes; - the list of inconsistent subsystems; - the description of each inconsistency; - the associated severity. This message is used both for logging and for publication by the observation supervisor through the subarray diagnostic attributes. Interaction with the observation supervisor ------------------------------------------- The ``CspObservationSupervisor`` is responsible for applying the policy decision to the subarray observation model and for reporting the result of each evaluation cycle back to the generic supervision loop. For each evaluation cycle, the supervisor: - takes an atomic snapshot of subsystem ``ObsState`` values from the state store; - computes the aggregated candidate ``ObsState`` through the subarray observation model; - evaluates the scan consistency policy using: - the candidate aggregated state, - the active observing modes, - the subsystem snapshot, - the previously published state, - the active command context selected by the observation supervisor; - updates diagnostic attributes related to scan consistency; - interprets the returned ``Decision.action``; - executes the corresponding domain-specific behaviour; - returns an ``EvaluationOutcome`` to the generic supervisor. Action handling ~~~~~~~~~~~~~~~ The supervisor interprets policy actions as follows: ``APPLY`` The final observation state is applied to the observation model. Diagnostic attributes are updated accordingly, and the evaluation cycle completes successfully. ``WAIT`` No final state is applied yet. The supervisor keeps the evaluation cycle pending so that the generic supervision loop can trigger a new evaluation later. ``FAULT`` The supervisor transitions the subarray into ``FAULT`` and reports the corresponding diagnostic information. The evaluation cycle then completes in fault. ``REFRESH_AND_REEVALUATE`` No final state is applied yet. The supervisor requests a refresh of the subsystem information and then performs a new policy evaluation using the refreshed snapshot. This action is used when the current state is still not conclusive after reconciliation timeout, but an additional refresh may provide enough information to reach a final decision. This action-based handling makes the policy intent explicit and avoids inferring control flow indirectly from severity or fault flags alone. The command context supplied to the policy is the supervisor's current active context. Internally, the supervisor may retain more than one ``CommandContext`` so that a delayed outcome can still be correlated to an older command. However, policy evaluation continues to operate on the single active context that best represents the current supervision focus. Hard consistency faults ~~~~~~~~~~~~~~~~~~~~~~~ If the policy returns an action of ``FAULT``, the supervisor latches the subarray into ``FAULT`` with cause ``CONSISTENCY``. This condition is reported through: - ``scanConsistencyErrorFlag = True`` - ``scanConsistencyErrorMsg`` containing the policy diagnostic message The consistency fault is automatically cleared only when the inconsistency is no longer present and the current fault cause is ``CONSISTENCY``. Medium-severity inconsistencies ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If the policy returns ``MEDIUM`` severity, the operational behaviour depends on the associated ``PolicyAction``. A medium-severity condition may either: - be applied immediately as a degraded but valid state (for example, ``SCANNING`` with degraded subsystem participation); or - remain pending if the policy determines that the evaluation is not yet conclusive. In both cases, the diagnostic attributes report the degraded condition. Low-severity inconsistencies ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If the policy returns only ``LOW`` severity and the selected action is ``APPLY``, the final ``ObsState`` remains ``SCANNING`` and the scan continues normally. As with medium severity, low severity is diagnostic in nature and should not be interpreted on its own as a control-flow instruction. Summary ------- The CSP.LMC scan consistency actively validates subsystem behaviour during scanning, adapts the set of required subsystems to the active observing modes, handles PST beams with mode-specific logic, and cooperates with the observation supervisor to: - continue scanning when inconsistencies are tolerable, - mark degraded scans through diagnostic information, - force a transition to ``FAULT`` when inconsistencies are non-recoverable.