Low PSI Hardware Audit

The Low PSI network topology — FPGA serials, PCIe BDFs, MAC/SPEAD IDs, P4 switch ports, trunk ports, timing offsets — lives in a single source-of-truth YAML file:

src/ska_low_cbf_integration/data/psi-net.yaml

It is consumed at runtime by ska_low_cbf_integration.low_psi_net (and re-exported by ska_low_cbf_integration.low_psi for backwards compatibility) and by three stand-alone tooling scripts:

  • scripts/psi_net_check_diagram.py — cross-checks the YAML against the low-psi-data-links.drawio.xml diagram. Reports any device or cable in one that isn’t in the other. Runs in CI as part of the lint checks.

  • scripts/psi_net_check_helm.py — cross-checks the YAML against the Helm chart at charts/psi-low.values.yaml. Ensures every FPGA’s serial / p4_port pair in the YAML matches the alveo= / port= entries in hardware_connections, and vice versa. Runs in CI as part of the lint checks.

  • scripts/psi_net_check_lsalveo.py — cross-checks the YAML against the hand-maintained bdf_to_sn_port dict in scripts/lsalveo. Parses the dict via ast (no execution — avoids lsalveo’s kubernetes runtime dependency) and verifies that every (host, BDF, serial, port) quadruple matches the YAML, and that every YAML FPGA on a host that lsalveo tracks is also present.

  • scripts/psi_fpga_audit.py — cross-checks the YAML against live FPGA hardware over SSH. Manual; documented below.

psi_fpga_audit.py

For a given host, the audit:

  1. SSHes into the host.

  2. Enumerates the FPGAs:

    • V80 cards via ami_tool overview / ami_tool mfg_info -d <bdf>

    • U55C cards via xbutil examine (for BDF + MAC) plus the xmc sysfs node /sys/bus/pci/devices/0000:XX:00.1/xmc*/serial_num (for the serial, which xbutil’s user-PF does not expose).

  3. Cross-references each card against the YAML and reports:

    • PCIe BDF reported by the tool exists in the YAML for this host (and every YAML entry on this host was reported by the tool).

    • Serial Number matches the YAML serial field.

    • MAC Address 1’s lower four bytes match the YAML spead_hwid field (the SPEAD hardware ID emitted on the wire is the bottom four bytes of the card’s MAC).

Exits non-zero if any check fails.

Running it

The script is a self-contained file with a single dependency (pyyaml). From the repo root:

poetry run python scripts/psi_fpga_audit.py psi-perentie1   # 10x U55C
poetry run python scripts/psi_fpga_audit.py psi-perentie2   # 6x V80
poetry run python scripts/psi_fpga_audit.py seren-08        # 2x V80

You need SSH access to the host (key-based auth — the script uses BatchMode=yes and will not prompt for a password). The remote host needs ami_tool (V80) or xbutil (U55C) installed.

Example output

Host: psi-perentie1
  YAML expects 10 FPGA(s): Alveo U55C

  PASS  0000:4f:00.1  XFL1E35JVJTQ  00:0a:35:0b:1a:08  (psi-perentie1/u55c-10)
  PASS  0000:52:00.1  XFL1XCRTUC22  00:0a:35:0b:19:10  (psi-perentie1/u55c-9)
  PASS  0000:53:00.1  XFL1VCYSXCL0  00:0a:35:0b:18:e0  (psi-perentie1/u55c-6)
  PASS  0000:56:00.1  XFL1ZIN0F4RO  00:0a:35:0b:19:b8  (psi-perentie1/u55c-7)
  ...

────────────────────────────────────────────────────────────
  0 failure(s)

When to run it

This is a manual audit, not part of CI. Run it:

  • After any physical card swap, to confirm the YAML reflects what is now installed.

  • When tests fail in ways that suggest the YAML may be stale (wrong serial on a port, unexpected SPEAD hwid in a capture, etc.).

Interpreting failures

  • BDF not in YAML for host X — a card is physically present that YAML doesn’t know about. Add the entry to psi-net.yaml.

  • NOT seen by tool — YAML expects a card at a BDF but the on-host tool didn’t report it. Either the card has been removed, has moved to a different BDF, or is in a bad state.

  • serial: expected X, got Y — the wrong card is at this BDF. Either update the YAML (if a swap was intentional and undocumented) or investigate the card identity.

  • spead_hwid: expected 0xXXXX, MAC1 (…) lower-4 is 0xYYYY — the YAML’s spead_hwid does not match the card’s actual MAC. This is usually a YAML transcription error; fix the YAML to match the MAC.