Component Snapshots in Health Supervision
Snapshot-based evaluation
Health evaluation is performed on a coherent view of the system rather than on individual asynchronous events.
As subsystem updates (health, operational state, admin mode, observation state) may arrive at different times, evaluating directly on event streams would lead to non-deterministic and potentially misleading intermediate outcomes.
For this reason, the health supervision pipeline evaluates snapshots: stable, point-in-time views of the latest known values for all relevant components.
This approach follows the same principles described in StateStore and snapshot-based evaluation (snapshot-based evaluation and thread-safe storage), while extending the stored payload to support health-specific needs.
Why component snapshots
Health evaluation requires more context than a single scalar state.
In particular, the diagnostic information reported through HealthInfo
may depend on:
component health state,
component operational state,
administrative mode,
component role/weight in aggregation logic,
optional observing state context.
To support this, the health supervision pipeline stores component snapshots that combine all relevant last-known values into a single, immutable record per component.
This ensures that each evaluation cycle operates on a consistent set of inputs for every component.
Partial updates and stability
Subsystem events often update only a subset of the tracked fields (e.g., operational state changes without a health state change).
The snapshot store supports partial updates by merging new information into the previously stored snapshot, preserving existing values when a field is not updated.
This avoids accidental loss of context and prevents evaluations based on incomplete records.
Revision tracking
To reduce unnecessary evaluations, the store maintains a monotonically increasing revision counter.
The revision is incremented only when the effective snapshot content changes. This allows the supervision layer to skip evaluations when incoming events do not materially affect the stored view of the system.
Summary
Component snapshots provide the data foundation for stable and deterministic health evaluation.
They enable:
consistent aggregation across asynchronous subsystem updates,
reliable generation of
HealthInfoexplanations,reduced redundant evaluations through revision tracking,
a clear separation between event ingestion and evaluation logic.