Health Diagnostics

Overview

The HealthDiagnostics component is responsible for generating HealthInfo messages that explain the aggregated HealthState.

It does not participate in the computation of the health condition. Instead, it interprets an already evaluated HealthState and produces operator-facing diagnostic information describing the current system condition.

This design enforces a clear separation between:

  • Health classification (performed by the evaluation model)

  • Health explanation (performed by the diagnostics component)

HealthDiagnostics produces only the local diagnostics payload for the target CSP.LMC device entry. Forwarded subsystem HealthInfo payloads are merged separately by the supervision/publication layer.

Execution Flow

HealthDiagnostics is executed after the Health Evaluation Model computes the aggregated HealthState.

Inputs:

  • A stable snapshot of subsystem HealthSample objects

  • The final aggregated HealthState

  • Optional contextual information provided by the model

Output:

  • A mapping device_fqdn -> List[str] containing the diagnostic messages to be published as HealthInfo.

If the aggregated health state is OK, the diagnostics payload for the target device is empty ({device_fqdn: []}).

Design Principles

Deterministic Behaviour

Diagnostics are derived exclusively from the provided snapshot and the computed health state. The component does not manage timing, retries, or transitional states.

Given the same inputs, it produces the same output.

Forced Conditions

The Health Evaluation Model may signal forced operational conditions, such as fault or disabled states.

When such context is provided:

  • The explicit reason takes precedence.

  • Snapshot-based diagnostics are not evaluated.

  • The forced reason becomes the published HealthInfo message.

Critical Infrastructure (CBF)

CBF devices represent critical infrastructure within CSP operation.

When a CBF device affects the aggregated health state, its condition is explicitly reflected in the diagnostic output.

If the aggregated health is FAILED and no CBF device is present in the snapshot, a specific diagnostic message is generated to indicate the absence of required critical infrastructure.

This ensures that failures or absence of CBF components are always visible to operators.

Non-Critical Components

Non-critical components contribute to diagnostics according to their operational status.

When administratively disabled, they do not influence the aggregated HealthState and do not generate diagnostic messages.

When administratively online, their state and health condition may contribute to both aggregation and diagnostics.

Their impact on the aggregated health is typically limited. Issues in non-critical components generally result in a DEGRADED condition rather than a FAILED state.

Diagnostic messages are generated only when their condition meaningfully affects the aggregated result, ensuring visibility of operational degradation without unnecessary escalation.

Diagnostic Generation

Diagnostic messages are derived from the subsystem snapshot in a manner consistent with the aggregation semantics.

The component:

  • Identifies components in problematic state.

  • Identifies components reporting non-OK HealthState.

  • Applies rules related to critical infrastructure.

  • Ensures messages are unique and consistently ordered.

Messages are plain strings intended for direct publication in the HealthInfo attribute.

Interaction with the Health Evaluation Model

The interaction between the two components follows a strict sequence:

  1. The Health Evaluation Model computes the aggregated state.

  2. The model optionally provides contextual information.

  3. HealthDiagnostics generates explanatory messages.

  4. The supervision layer publishes both attributes.

The diagnostics component never re-evaluates or overrides the aggregated HealthState.

Operational Characteristics

  • Stateless: no internal persistence across evaluations.

  • Deterministic: identical inputs produce identical output.

  • Snapshot-based: operates only on stable supervision data.

  • Independent: unaware of supervision timing or transitions.

Summary

HealthDiagnostics translates the aggregated HealthState into clear, operator-facing diagnostic information.

While the Health Evaluation Model determines what the system health is, the diagnostics component explains why it is in that condition, without influencing the aggregation logic.

See Also