Overview

This section describes the observation supervision mechanism used in CSP.LMC to maintain a consistent observing state across subsystems in a distributed and asynchronous environment.

In CSP.LMC, subsystem observing state updates are produced independently and may arrive at different times, with different frequencies, and in different orders. During normal operation, temporary divergence between subsystem states is expected, especially while a transition is still in progress.

Directly reacting to each individual subsystem update would expose the system to transient state oscillations, premature fault classification, and publication of intermediate states that do not represent a stable system-wide condition.

To address this, CSP.LMC introduces an observation supervisor as an intermediate coordination layer between raw subsystem events and the observing state exposed by the CSP device. The supervisor controls when state evaluations are performed and ensures that decisions are taken only when the aggregated system view is sufficiently reliable.

The supervision mechanism combines:

debounced evaluation, to avoid reacting to short-lived transitions;
a maximum latency bound, to guarantee progress even under frequent updates;
a bounded reconciliation phase, to allow late subsystem updates to be incorporated after an inconclusive evaluation.

This section introduces the role of the supervisor, the main actors in the supervision flow, and the rationale for a debounced and policy-driven evaluation model.

Main actors

The supervision architecture is based on the following roles.

Subsystems: Individual CSP subsystems, such as CBF, PSS, and PST, report their observing states independently according to their own internal lifecycle and timing.
Event aggregation: Subsystem updates are received and normalized by the event management layer, which acts as the entry point for observing-state-related changes. This layer is responsible for event ingestion and forwarding only; it does not perform consistency evaluation (see event manager).
State storage: A centralized, thread-safe store maintains the latest known observing state for each subsystem. This store provides the coherent input snapshot used by the supervisor during evaluation.
Observation supervisor: The supervisor coordinates evaluation timing and supervision flow. It buffers incoming updates, applies debounce and latency constraints, and triggers evaluations when the timing conditions for a meaningful assessment are met. When an evaluation is inconclusive, it may keep the supervision cycle open for a bounded reconciliation interval.
Consistency policy: The policy defines the domain-specific rules used to assess whether the current combination of subsystem states is valid, inconsistent, faulty, or still incomplete for the current operational context.
Observing state model: The observing state model is responsible for applying the final observing state, enforcing precedence and de-duplication rules, and publishing the resulting state externally.

High-level behaviour

At a high level, the supervision flow is as follows:

Subsystems emit observing state updates asynchronously.
The latest subsystem states are collected and stored.
The supervisor delays evaluation until either:
- the updates remain stable for the debounce interval; or
- the maximum latency threshold is reached.
The consistency policy evaluates the current subsystem snapshot.
If the evaluation is conclusive, the resulting observing state is applied or a fault condition is raised.
If the evaluation is inconclusive, the supervisor may enter a bounded reconciliation phase and wait for further subsystem updates before re-evaluating.

Critical conditions, such as FAULT or ABORTED, may bypass the normal debounce delay in order to preserve timely reaction.

The reconciliation phase is intended to distinguish between a genuine system inconsistency and a transient condition in which one or more subsystem updates are still pending.

Rationale

In CSP.LMC, no single subsystem has full knowledge of the global observing state. Each subsystem reports only its own local state and timing. The supervision layer is therefore needed to correlate these independent reports and derive a coherent system-level interpretation.

Without supervision, evaluating observing state directly on individual events would lead to:

publication of transient intermediate states;
decisions based on incomplete or partially converged information;
poor discrimination between real faults and short-lived anomalies;
unstable and noisy observing state propagation.

The supervision layer addresses these issues by providing:

a single coordination point for subsystem state correlation;
temporal decoupling between event arrival and state evaluation;
evaluation based on coherent snapshots rather than individual events;
explicit handling of consistency, waiting, and fault semantics;
bounded waiting for late subsystem convergence when required.

This design provides a robust basis for observing state supervision in the presence of asynchronous and distributed subsystem behaviour.

Illustrative flow

The following diagram summarizes the interaction between subsystem updates, supervision logic, and observing state publication.

Subsystems (CBF / PST / PSS)
        |
        v
  EventManager
        |
        v
    StateStore  ---- snapshot ----+
                                  |
                                  v
                        ObservationSupervisor
                                  |
                          Consistency Policy
                                  |
                          ObsStateModel
                                  |
                        Published ObsState

Evolution of the supervision model

The supervision model is evolving from a purely event-driven observing state coordinator towards a mechanism that can also interpret subsystem state convergence in the context of command execution.

In particular, the supervisor is being extended so that, while a command is in progress or has recently completed, the aggregated observing state can be interpreted together with command-tracking information. This makes it possible to distinguish between:

a valid state consistent with the expected command outcome;
a genuine FAULT condition;
an incomplete state caused by delayed subsystem convergence.

This distinction is important in a distributed asynchronous system, where command completion and subsystem event convergence are not guaranteed to occur at the same time.

A command may be reported as complete while relevant subsystem updates are still propagating. Evaluating the aggregated state without this context could therefore result in premature fault classification or publication of a transient state.

To support this evolution, the supervision design is being refined to separate:

policy interpretation of the current subsystem snapshot;
the action implied by that interpretation;
the outcome of the supervisor evaluation cycle.

This separation is reflected by the introduction of PolicyAction in the policy decision model and EvaluationOutcome in the supervisor contract.

These refinements allow the supervisor to apply a final state, raise a fault, or keep the evaluation cycle temporarily open while waiting for additional subsystem information within a bounded reconciliation interval. ======= These changes make it possible for the supervisor to decide, in a command-aware way, whether to apply a final state, declare a fault, or wait for additional subsystem information before concluding the evaluation.

Command context lifecycle

The CspObservationSupervisor now keeps command-aware runtime metadata in a registry of CommandContext objects keyed by command_id, while also tracking one active command context.

In the common case, when a new command starts, the previously active context is discarded and replaced by the new one. This keeps the supervision model simple and focused on the command that is currently driving observation-state convergence.

Abort is handled as a special case. When an Abort command starts while another command context is still active, the previous context is preserved and marked as overridden by the Abort command. This allows the supervisor to still process a delayed terminal outcome for the interrupted command, while treating the Abort context as the active one for policy evaluation.

When the tracker later reports a terminal CommandOutcome, the supervisor updates the corresponding CommandContext by command_id rather than assuming a single global context slot.

For the complete command context lifecycle and scan-specific semantics, see Command context.