Health Diagnostics ================== Overview -------- The ``HealthDiagnostics`` component is responsible for generating ``HealthInfo`` messages that explain the aggregated ``HealthState``. It does not participate in the computation of the health condition. Instead, it interprets an already evaluated ``HealthState`` and produces operator-facing diagnostic information describing the current system condition. This design enforces a clear separation between: - Health classification (performed by the evaluation model) - Health explanation (performed by the diagnostics component) ``HealthDiagnostics`` produces only the local diagnostics payload for the target CSP.LMC device entry. Forwarded subsystem ``HealthInfo`` payloads are merged separately by the supervision/publication layer. Execution Flow -------------- ``HealthDiagnostics`` is executed after the Health Evaluation Model computes the aggregated ``HealthState``. Inputs: - A stable snapshot of subsystem ``HealthSample`` objects - The final aggregated ``HealthState`` - Optional contextual information provided by the model Output: - A mapping ``device_fqdn -> List[str]`` containing the diagnostic messages to be published as ``HealthInfo``. If the aggregated health state is ``OK``, the diagnostics payload for the target device is empty (``{device_fqdn: []}``). Design Principles ----------------- Deterministic Behaviour ^^^^^^^^^^^^^^^^^^^^^^^ Diagnostics are derived exclusively from the provided snapshot and the computed health state. The component does not manage timing, retries, or transitional states. Given the same inputs, it produces the same output. Forced Conditions ^^^^^^^^^^^^^^^^^ The Health Evaluation Model may signal forced operational conditions, such as fault or disabled states. When such context is provided: - The explicit reason takes precedence. - Snapshot-based diagnostics are not evaluated. - The forced reason becomes the published ``HealthInfo`` message. Critical Infrastructure (CBF) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ CBF devices represent critical infrastructure within CSP operation. When a CBF device affects the aggregated health state, its condition is explicitly reflected in the diagnostic output. If the aggregated health is ``FAILED`` and no CBF device is present in the snapshot, a specific diagnostic message is generated to indicate the absence of required critical infrastructure. This ensures that failures or absence of CBF components are always visible to operators. Non-Critical Components ^^^^^^^^^^^^^^^^^^^^^^^ Non-critical components contribute to diagnostics according to their operational status. When administratively disabled, they do not influence the aggregated ``HealthState`` and do not generate diagnostic messages. When administratively online, their state and health condition may contribute to both aggregation and diagnostics. Their impact on the aggregated health is typically limited. Issues in non-critical components generally result in a ``DEGRADED`` condition rather than a ``FAILED`` state. Diagnostic messages are generated only when their condition meaningfully affects the aggregated result, ensuring visibility of operational degradation without unnecessary escalation. Diagnostic Generation --------------------- Diagnostic messages are derived from the subsystem snapshot in a manner consistent with the aggregation semantics. The component: - Identifies components in problematic state. - Identifies components reporting non-OK ``HealthState``. - Applies rules related to critical infrastructure. - Ensures messages are unique and consistently ordered. Messages are plain strings intended for direct publication in the ``HealthInfo`` attribute. Interaction with the Health Evaluation Model -------------------------------------------- The interaction between the two components follows a strict sequence: 1. The Health Evaluation Model computes the aggregated state. 2. The model optionally provides contextual information. 3. ``HealthDiagnostics`` generates explanatory messages. 4. The supervision layer publishes both attributes. The diagnostics component never re-evaluates or overrides the aggregated ``HealthState``. Operational Characteristics --------------------------- - Stateless: no internal persistence across evaluations. - Deterministic: identical inputs produce identical output. - Snapshot-based: operates only on stable supervision data. - Independent: unaware of supervision timing or transitions. Summary ------- ``HealthDiagnostics`` translates the aggregated ``HealthState`` into clear, operator-facing diagnostic information. While the Health Evaluation Model determines *what* the system health is, the diagnostics component explains *why* it is in that condition, without influencing the aggregation logic. See Also -------- - :doc:`health_state_model` - :doc:`health_info` - :doc:`health_architecture_contract`