LowCbfProcessor

LowCbfProcessor is a Tango device server for monitoring and control of registers in the Low.CBF signal processing FPGAs. Processors are shared by and may perform correlation or beamforming for multiple subarrays simultaneously.

The processor device uses the ska-low-cbf-fpga python package to represent registers within the FPGA design as “fields” and groupings of fields (“peripherals”) it can interact with. The ska-low-cbf-fpga package is a general framework for interacting with registers in Low.CBF FPGA designs. Registers within a FPGA are read from the address map file associated with each FPGA design (fpgamap_NNNNNNNN.py).

The processor device repository contains code to control and monitor two specific FPGA design personalities
  • Low.CBF PST Beamformer FPGAs

  • Low.CBF correlator FPGAs

  • Low.CBF PSS Beamformer FPGAs

Processor communication with Allocator

Processor Tango devices subscribe to Allocator events that represent the desired state of the processor device. On receipt of an event, a processor device will attempt to conform itself to the desired state. It will translate the attribute parameters into appropriate register values for the FPGA design personality. There are two attributes to which a Processor subscribes:

  • “internal_alveo” conveys per-Alveo data

  • “internal_subarray” conveys per-subarray settings that are used globally by Alveos

The Allocator “internal_alveo” attribute conveys information about the internal register settings of every Alveo. The attribute data is a JSON-encoded dictionary, using Alveo serial numbers as the key. If an Alveo card’s serial number is not present then the Alveo is not currently in use by any subarray. Each Alveo uses only the data listed under its own ID. The data includes the FPGA personality the Alveo should run and an abbreviated description of its register settings.

The Allocator’s “internal_subarray” attribute conveys information about the currently operating subarrays, including delay polynomial sources. The attribute data is a JSON-encoded dictionary using Subarray ID [1-16] as key. If a subarray’s key is not present the the subarray is not in use. Values for each subarray provide information about subarray stations, beams, frequencies, and destinations for the Alveo’s output products. The information is potentially used by, and common to, every Alveo.

Delay polynomials

The processor subscribes to tango attributes that provide station-beam delay polynomials (currently a 5th order) for each station contributing to a station beam. It examines time in the incoming SPS packets and chooses polynomials with appropriate start-of-validity to use for delay calculations from an internal queue of polynomias it received The “stats_delay” attribute provides information about delay polynomials in the FPGA registers and whether the polynomials are valid (ie not being used before start-of-validity and not being used after end-of-validity).

PST Jones corrections and RCAL

The processor accepts PST calibration coefficients by subscribing to Kafka servers and topics if they are present in the subarray configuration. It expects to receive all the Jones matricies for a PST beam from a single topic, i.e. one 2x2 complex matrix per station and per SPS channel. There are several error messages that will be logged in the event of incorrect data received from kafka. The kafka URL provided in the subarray configuration is expected to have the form: “kafka://kafka_server_addr:server_port/topic_name” Jones matricies are applied as soon as possible after they are received and considered to be valid for one hour. If not updated before the hour expires,the PST beam will contain “invalid jones” flags.

Design

Processors load a FPGA executable for the particular Personality (correlator, beamformer, etc) they are to run. The executable is downloaded from SKA CAR or Gitlab. Since the download takes longer time than a Tango command is allowed to execute, firmware download and load is handled by a separate thread, allowing the Tango thread to return before Tango times it out. Because FPGA personality can change at any time, code is designed to receive requests to download a new personality any time, even while a download is in progress.

Register values in an Alveo are only updated when any subarray that the Processor is handling changes between scanning and not-scanning states and when incoming packets contain time that matches more recent delay polynomial validity. Scanning state is derived from “internal_subarray” events, and any event that indicates scanning state has changed cause registers to be reprogrammed immediately.

Data from “internal_alveo” events however is simply recorded for use later, i.e. when scanning state changes. The SKA observing state machine ensures that a subarray receives configuration (affecting “internal_alveo” data) before it begins to scan, so the last recorded version of “internal_alveo” event data always contains the required Alveo register configuration to be applied when an “internal_subarray” event is received.

adminMode

The processor’s adminMode Tango attribute affects the operation of the processor device and FPGA in several ways:
  • The Processor FPGA will only produce output packets when adminMode is ONLINE or ENGINEERING

  • Processor output can be temporarily suspended by setting adminMode OFFLINE

  • Processor output can be resumed from the temporarily suspended state by setting adminMode back to ONLINE or ENGINEERING
    • adminMode will only accept a transition back to ONLINE or ENGINEERING state it was suspended from (but the Processor exits the suspended state if the subarray is de-configured while still suspended)

In all other respects processor adminMode behaves the same as the standard SKA adminMode state machine. The processor code uses the SKA adminMode state machine base class, but has added checks to implement the extra behaviour described above. The effect of changes is that the usual “offline” state has been split into three sub-states shown in the diagram below. The substates ensure that adminMode only goes back to the state it was suspended from.

state diagram for processor adminMode

Tango attribute/command list

Processor device Tango attributes

Additional Tango attributes, active for Correlator only

processor.processor_device::LowCbfProcessor.stats_total_pkt_counter
32-bit count of all packets received
processor.processor_device::LowCbfProcessor.stats_spead_pkt_counter
32-bit count of SPEAD packets revceived
processor.processor_device::LowCbfProcessor.stats_spead_unexpected_counter
32-bit count of SPEAD packets that were not expected at this FPGA. Increments in this
counter may indicate routing problems in the switch.
processor.processor_device::LowCbfProcessor.stats_spead_early_or_late_counter
32-bit count of SPEAD packets that were too early or too late (compared to packets
from other stations) to be included in a frame of data. Increments in this counter
may indicate excessive spread between the packet timestamps of SPS stations.
processor.processor_device::LowCbfProcessor.stats_spead_missing_counter
32-bit count of SPEAD packets that were expected but not received in time to be
included in a frame of data. Increments in this counter may indicate that some
stations are not sending data
processor.processor_device::LowCbfProcessor.stats_bad_eth_packet_counter
32-bit count of "bad" ethernet packets. Increments in this value indicate a bad
optical cable or connector or module
processor.processor_device::LowCbfProcessor.stats_ethernet_status
Bit-vector indicating that the ethernet interface is online with the switch. For
U55C there is a single interface that should show as 0x1 when up. Note the interface
will only show "up" when a correlation is configured in the FPGA.
processor.processor_device::LowCbfProcessor.stats_fpga_error_status
Bit vector indicating whether the FPGA has encountered an error. Any bit set indicates
a problem. The meaning of the bits is firmware version dependent. Errors reported by
this attribute will usually stop the FPGA from performing its function. FPGA firmware
designers should be notified so the FPGA design can be fixed.
To clear, either manually reset the FPGA from the linux command line,
or load different firmware into the FPGA.

Processor device Tango commands

Dynamically created Tango attibutes

These attributes are created for each Alveo on startup.

Functional Health

LowCbfProcessor.health_function

Indicates the device’s functional health state.

Its value is the worst case scenario of constituent attributes (function_* below)

Contributes towards healthState attribute.

See also health overview

LowCbfProcessor.function_firmware_loaded

Indicates whether FPGA firmware is loaded into Alveo card.

Contributes towards health_function attribute.

LowCbfProcessor.function_driver_ok

Indicates whether FPGA driver is communcating with Alveo card.

Contributes towards health_function attribute.

Hardware Health

LowCbfProcessor.health_hardware

Indicates the device’s hardware health state.

Its value is the worst case scenario of constituent attributes (hardware_* below)

Contributes towards healthState attribute.

See also health overview

LowCbfProcessor.hardware_fpga_temperature

Indicates if FPGA temperature is within operating limits.

LowCbfProcessor.hardware_fpga_power

Indicates if FPGA power consumption is within limits.

LowCbfProcessor.hardware_hbm_temperature

Indicates if HBM (high bandwidth memory) temperature is within operating limits.

LowCbfProcessor.hardware_power_supply_12v_voltage

Indicates if 12 V power rail voltage is within operating limits.

LowCbfProcessor.hardware_power_supply_12v_current

Indicates if 12 V power rail current is within operating limits.

LowCbfProcessor.hardware_pcie_12v_voltage

Indicates if PCIe bus 12 V power rail voltage is within operating limits.

LowCbfProcessor.hardware_pcie_12v_current

Indicates if PCIe bus 12 V power rail current is within operating limits.

Process Health

LowCbfProcessor.health_process

Indicates the device’s process health state.

Its value is the worst case scenario of constituent attributes (process_* below)

Contributes towards healthState attribute.

See also health overview

LowCbfProcessor.process_delay_subscription_ok

Indicates correctness of arrival of delay polynomials

LowCbfProcessor.process_delay_poly_valid

Indicates correctness of delay polynomial values

Contributes towards health_process attribute.

LowCbfProcessor.process_spead_packets_ok

Indicates SPS SPEAD packets are arriving at FPGA.

Contributes towards health_process attribute.

Test Mode

If Test mode is active, the value of some attributes can be temporarily changed to any desired value for testing.

Continuous Integration

The current CI tests of this device use DeviceTestContext and do not require a full Tango system.

Environment Variables

Some runtime behaviours can be configured through environment variables.

  • INITIAL_ADMINMODE variable can be used to select the default value of adminMode Tango attribute on startup.

  • ALLOW_ADMIN_CHANGE_WHILE_USED allows adminMode to be altered at any time if set to true. If false (the default), adminMode is unable to be changed unless the FPGA is not in use by any subarray.

  • FPGA_XRT_TIMEOUT allows the timeout (milliseconds) for each FPGA register read or write to be extended (default 5 if not set)

  • FPGA_POST_FW_LOAD_DELAY specifies time in seconds after firmware load before register reads will time out. (Allows for busy server CPU after loading a FPGA)

  • CACHE_DIR allows firmware to be cached on-server, saving download bandwidth. Our Helm chart sets this to /app, meaning downloads will be cached in the pod’s ephemeral storage. For best results, override this to something persistent & shared between pods (e.g. a volume mount on the FPGA host server).

  • STN_DELAY_SIGN can be set “pos” or “neg” to change to allow station delay polynomials to be applied with positive or negative sign

  • PST_DELAY_SIGN can be set “pos” or “neg” to change to allow PST beam delay polynomials to be applied with positive or negative sign

  • PSS_DELAY_SIGN can be set “pos” or “neg” to change to allow PSS beam delay polynomials to be applied with positive or negative sign

  • PST_DELAY_FORMAT can be set “diff” or “full” to describe whether PST delay polynomials are supplied as differences from station beam, or as full delay polynomials independent of any reference to the station beam. In the latter case the cadence of station beam and PST beam polynomials must be identical.

  • PSS_DELAY_FORMAT can be set “diff” or “full” to describe whether PSS delay polynomials are supplied as differences from station beam, or as full delay polynomials independent of any reference to the station beam. In the latter case the cadence of station beam and PSS beam polynomials must be identical. ISOLATED allows processor to run without an allocator connection (software testing) CLEAR_PST_DELAYS_BEFORE_SCANS When defined (with any value), ensures that PST delay polynomials from the prior scan are NOT used as the initial delay polymomials for the next scan, even if they are still valid. This may be helpful if a Low.CBF subarray is configured then scanned multiple times, but between scans the beam pointing directions are changed by manipulating the delay polynomials provided externally to Low.CBF. Note that delay polynomials are always cleared for the first scan after a subarray has been reconfigured, and this setting does not override that behaviour. RIPPLE_COMPENSATION when set to “18a” or “16d” applies a filter to compensate for passband ripple in coarse channel data from TPMs. Select an appropriate value for your TPMs (“16d” in PI26, but after TPM firmware is updated in PI27 use “18a”). Ripple compensation will be disabled if the environment variable is not present or has an unrecognised value. Log messages will confirm which filter is chosen. ALLOCATOR_RECHECK_SECS Time between checks for allocator restarts. Processor will re-register with allocator if check shows allocator has redeployed/restarted. Defaults to 30 seconds if the environment variable is not present or the value is invalid. JONES_UNITY If present and contains a JSON string encoding 8 floating point numbers AND subarray is configured without kafka RCAL address, then instead of using the identity matrix for Jones corrections, the processor will use the encoded values. Values in the JSON string are ‘[Jones[1,1].re, Jones[1,1].im, Jones[1,2].re, Jones[1,2].im, Jones[2,1].re, Jones[2,1].im, Jones[2,2].re, Jones[2,2].im’]. For correct operation values matrix entries should be floating point with value between -1.0 and +1.0. (This env var is intended only to assist with testing) INIT_RESEND_SECS Time between periodic resend of spead-INIT packets for correlator, default 30.0 if not defined. Introduced for ADR111. INIT_SCANNING_FACTOR Integer 1 or above, default 10 if undefined. For a value N, scanning subarrays send their periodic INITs at 1/Nth of the “INIT_RESEND_SECS” interval. INIT_RESEND_DISABLE If defined, turns off periodic send of correlator INIT packets NO_VIS_DATA_ZERO_SCANID If defined, turns off sending visibilities when not scanning Note this is NOT implemented

An example of the Helm chart keys that parent charts would need to override to achieve correct processor operation is given in the test-parent chart’s values.yaml file:

FPGA Environment Variables

These environment variables are used to inform the LowCbfProcessor software about the Alveo FPGA card in use. They are automatically configured by the processor-device.sh script on pod startup.

  • FPGA_BDF - PCIe BDF address, e.g. “86:00.0” or “0000:07:00.0”. Format varies depending on underlying driver in use.

  • FPGA_DRIVER - low-level driver in use, either “AMI” or “XRT”.

  • FPGA_TYPE - model of Alveo card in use, e.g. “u55c”.

  • SERIAL_NUM - Alveo card serial number.

To access High Bandwidth Memory (HBM) on FPGA cards using the AMI driver (i.e. V80), additional environment variables must be set to provide paths to QDMA character devices:

  • FPGA_QDMA_0 - read (c2h) device, e.g. “/dev/qdma86001-MM-0”

  • FPGA_QDMA_1 - write (h2c) device, e.g. “/dev/qdma86001-MM-1”

State diagram for FPGA-usage state machine in cor_state_machine.py

. . image:: ../diagrams/cor-fpga-usage-statemachine.png

alt:

state diagram for Correlator FPGA usage (historica)

Historical record of FPGA usage state machine at February 2026