Low PSI Hardware Audit
The Low PSI network topology — FPGA serials, PCIe BDFs, MAC/SPEAD IDs, P4 switch ports, trunk ports, timing offsets — lives in a single source-of-truth YAML file:
src/ska_low_cbf_integration/data/psi-net.yaml
It is consumed at runtime by ska_low_cbf_integration.low_psi_net (and
re-exported by ska_low_cbf_integration.low_psi for backwards
compatibility) and by three stand-alone tooling scripts:
scripts/psi_net_check_diagram.py— cross-checks the YAML against thelow-psi-data-links.drawio.xmldiagram. Reports any device or cable in one that isn’t in the other. Runs in CI as part of the lint checks.scripts/psi_net_check_helm.py— cross-checks the YAML against the Helm chart atcharts/psi-low.values.yaml. Ensures every FPGA’sserial/p4_portpair in the YAML matches thealveo=/port=entries inhardware_connections, and vice versa. Runs in CI as part of the lint checks.scripts/psi_net_check_lsalveo.py— cross-checks the YAML against the hand-maintainedbdf_to_sn_portdict inscripts/lsalveo. Parses the dict viaast(no execution — avoids lsalveo’skubernetesruntime dependency) and verifies that every(host, BDF, serial, port)quadruple matches the YAML, and that every YAML FPGA on a host that lsalveo tracks is also present.scripts/psi_fpga_audit.py— cross-checks the YAML against live FPGA hardware over SSH. Manual; documented below.
psi_fpga_audit.py
For a given host, the audit:
SSHes into the host.
Enumerates the FPGAs:
V80 cards via
ami_tool overview/ami_tool mfg_info -d <bdf>U55C cards via
xbutil examine(for BDF + MAC) plus the xmc sysfs node/sys/bus/pci/devices/0000:XX:00.1/xmc*/serial_num(for the serial, which xbutil’s user-PF does not expose).
Cross-references each card against the YAML and reports:
PCIe BDF reported by the tool exists in the YAML for this host (and every YAML entry on this host was reported by the tool).
Serial Number matches the YAML
serialfield.MAC Address 1’s lower four bytes match the YAML
spead_hwidfield (the SPEAD hardware ID emitted on the wire is the bottom four bytes of the card’s MAC).
Exits non-zero if any check fails.
Running it
The script is a self-contained file with a single dependency
(pyyaml). From the repo root:
poetry run python scripts/psi_fpga_audit.py psi-perentie1 # 10x U55C
poetry run python scripts/psi_fpga_audit.py psi-perentie2 # 6x V80
poetry run python scripts/psi_fpga_audit.py seren-08 # 2x V80
You need SSH access to the host (key-based auth — the script uses
BatchMode=yes and will not prompt for a password). The remote host
needs ami_tool (V80) or xbutil (U55C) installed.
Example output
Host: psi-perentie1
YAML expects 10 FPGA(s): Alveo U55C
PASS 0000:4f:00.1 XFL1E35JVJTQ 00:0a:35:0b:1a:08 (psi-perentie1/u55c-10)
PASS 0000:52:00.1 XFL1XCRTUC22 00:0a:35:0b:19:10 (psi-perentie1/u55c-9)
PASS 0000:53:00.1 XFL1VCYSXCL0 00:0a:35:0b:18:e0 (psi-perentie1/u55c-6)
PASS 0000:56:00.1 XFL1ZIN0F4RO 00:0a:35:0b:19:b8 (psi-perentie1/u55c-7)
...
────────────────────────────────────────────────────────────
0 failure(s)
When to run it
This is a manual audit, not part of CI. Run it:
After any physical card swap, to confirm the YAML reflects what is now installed.
When tests fail in ways that suggest the YAML may be stale (wrong serial on a port, unexpected SPEAD hwid in a capture, etc.).
Interpreting failures
BDF not in YAML for host X — a card is physically present that YAML doesn’t know about. Add the entry to
psi-net.yaml.NOT seen by tool — YAML expects a card at a BDF but the on-host tool didn’t report it. Either the card has been removed, has moved to a different BDF, or is in a bad state.
serial: expected X, got Y — the wrong card is at this BDF. Either update the YAML (if a swap was intentional and undocumented) or investigate the card identity.
spead_hwid: expected 0xXXXX, MAC1 (…) lower-4 is 0xYYYY — the YAML’s
spead_hwiddoes not match the card’s actual MAC. This is usually a YAML transcription error; fix the YAML to match the MAC.