LowCbfProcessor
LowCbfProcessor is a Tango device server for monitoring and control of registers in the Low.CBF signal processing FPGAs. Processors are shared by and may perform correlation or beamforming for multiple subarrays simultaneously.
The processor device uses the ska-low-cbf-fpga python package to represent registers within the FPGA design as “fields” and groupings of fields (“peripherals”) it can interact with. The ska-low-cbf-fpga package is a general framework for interacting with registers in Low.CBF FPGA designs. Registers within a FPGA are read from the address map file associated with each FPGA design (fpgamap_NNNNNNNN.py).
- The processor device repository contains code to control and monitor two specific FPGA design personalities
Low.CBF PST Beamformer FPGAs
Low.CBF correlator FPGAs
Low.CBF PSS Beamformer FPGAs
Processor communication with Allocator
Processor Tango devices subscribe to Allocator events that represent the desired state of the processor device. On receipt of an event, a processor device will attempt to conform itself to the desired state. It will translate the attribute parameters into appropriate register values for the FPGA design personality. There are two attributes to which a Processor subscribes:
“internal_alveo” conveys per-Alveo data
“internal_subarray” conveys per-subarray settings that are used globally by Alveos
The Allocator “internal_alveo” attribute conveys information about the internal register settings of every Alveo. The attribute data is a JSON-encoded dictionary, using Alveo serial numbers as the key. If an Alveo card’s serial number is not present then the Alveo is not currently in use by any subarray. Each Alveo uses only the data listed under its own ID. The data includes the FPGA personality the Alveo should run and an abbreviated description of its register settings.
The Allocator’s “internal_subarray” attribute conveys information about the currently operating subarrays, including delay polynomial sources. The attribute data is a JSON-encoded dictionary using Subarray ID [1-16] as key. If a subarray’s key is not present the the subarray is not in use. Values for each subarray provide information about subarray stations, beams, frequencies, and destinations for the Alveo’s output products. The information is potentially used by, and common to, every Alveo.
Delay polynomials
The processor subscribes to tango attributes that provide station-beam delay polynomials (currently a 5th order) for each station contributing to a station beam. It examines time in the incoming SPS packets and chooses polynomials with appropriate start-of-validity to use for delay calculations from an internal queue of polynomias it received The “stats_delay” attribute provides information about delay polynomials in the FPGA registers and whether the polynomials are valid (ie not being used before start-of-validity and not being used after end-of-validity).
PST Jones corrections and RCAL
The processor accepts PST calibration coefficients by subscribing to Kafka servers and topics if they are present in the subarray configuration. It expects to receive all the Jones matricies for a PST beam from a single topic, i.e. one 2x2 complex matrix per station and per SPS channel. There are several error messages that will be logged in the event of incorrect data received from kafka. The kafka URL provided in the subarray configuration is expected to have the form: “kafka://kafka_server_addr:server_port/topic_name” Jones matricies are applied as soon as possible after they are received and considered to be valid for one hour. If not updated before the hour expires,the PST beam will contain “invalid jones” flags.
Design
Processors load a FPGA executable for the particular Personality (correlator, beamformer, etc) they are to run. The executable is downloaded from SKA CAR or Gitlab. Since the download takes longer time than a Tango command is allowed to execute, firmware download and load is handled by a separate thread, allowing the Tango thread to return before Tango times it out. Because FPGA personality can change at any time, code is designed to receive requests to download a new personality any time, even while a download is in progress.
Register values in an Alveo are only updated when any subarray that the Processor is handling changes between scanning and not-scanning states and when incoming packets contain time that matches more recent delay polynomial validity. Scanning state is derived from “internal_subarray” events, and any event that indicates scanning state has changed cause registers to be reprogrammed immediately.
Data from “internal_alveo” events however is simply recorded for use later, i.e. when scanning state changes. The SKA observing state machine ensures that a subarray receives configuration (affecting “internal_alveo” data) before it begins to scan, so the last recorded version of “internal_alveo” event data always contains the required Alveo register configuration to be applied when an “internal_subarray” event is received.
adminMode
- The processor’s adminMode Tango attribute affects the operation of the processor device and FPGA in several ways:
The Processor FPGA will only produce output packets when adminMode is ONLINE or ENGINEERING
Processor output can be temporarily suspended by setting adminMode OFFLINE
- Processor output can be resumed from the temporarily suspended state by setting adminMode back to ONLINE or ENGINEERING
adminModewill only accept a transition back to ONLINE or ENGINEERING state it was suspended from (but the Processor exits the suspended state if the subarray is de-configured while still suspended)
In all other respects processor adminMode behaves the same as the standard SKA adminMode state machine. The processor code uses the SKA adminMode state machine base class, but has added checks to implement the extra behaviour described above. The effect of changes is that the usual “offline” state has been split into three sub-states shown in the diagram below. The substates ensure that adminMode only goes back to the state it was suspended from.
Tango attribute/command list
Processor device Tango attributes
Additional Tango attributes, active for Correlator only
- processor.processor_device::LowCbfProcessor.stats_total_pkt_counter
- 32-bit count of all packets received
- processor.processor_device::LowCbfProcessor.stats_spead_pkt_counter
- 32-bit count of SPEAD packets revceived
- processor.processor_device::LowCbfProcessor.stats_spead_unexpected_counter
- 32-bit count of SPEAD packets that were not expected at this FPGA. Increments in this
- counter may indicate routing problems in the switch.
- processor.processor_device::LowCbfProcessor.stats_spead_early_or_late_counter
- 32-bit count of SPEAD packets that were too early or too late (compared to packets
- from other stations) to be included in a frame of data. Increments in this counter
- may indicate excessive spread between the packet timestamps of SPS stations.
- processor.processor_device::LowCbfProcessor.stats_spead_missing_counter
- 32-bit count of SPEAD packets that were expected but not received in time to be
- included in a frame of data. Increments in this counter may indicate that some
- stations are not sending data
- processor.processor_device::LowCbfProcessor.stats_bad_eth_packet_counter
- 32-bit count of "bad" ethernet packets. Increments in this value indicate a bad
- optical cable or connector or module
- processor.processor_device::LowCbfProcessor.stats_ethernet_status
- Bit-vector indicating that the ethernet interface is online with the switch. For
- U55C there is a single interface that should show as 0x1 when up. Note the interface
- will only show "up" when a correlation is configured in the FPGA.
- processor.processor_device::LowCbfProcessor.stats_fpga_error_status
- Bit vector indicating whether the FPGA has encountered an error. Any bit set indicates
- a problem. The meaning of the bits is firmware version dependent. Errors reported by
- this attribute will usually stop the FPGA from performing its function. FPGA firmware
- designers should be notified so the FPGA design can be fixed.
- To clear, either manually reset the FPGA from the linux command line,
- or load different firmware into the FPGA.
Processor device Tango commands
Dynamically created Tango attibutes
These attributes are created for each Alveo on startup.
Functional Health
- LowCbfProcessor.health_function
Indicates the device’s functional health state.
Its value is the worst case scenario of constituent attributes (
function_*below)Contributes towards
healthStateattribute.
- LowCbfProcessor.function_firmware_loaded
Indicates whether FPGA firmware is loaded into Alveo card.
Contributes towards
health_functionattribute.
- LowCbfProcessor.function_driver_ok
Indicates whether FPGA driver is communcating with Alveo card.
Contributes towards
health_functionattribute.
Hardware Health
- LowCbfProcessor.health_hardware
Indicates the device’s hardware health state.
Its value is the worst case scenario of constituent attributes (
hardware_*below)Contributes towards
healthStateattribute.
- LowCbfProcessor.hardware_fpga_temperature
Indicates if FPGA temperature is within operating limits.
- LowCbfProcessor.hardware_fpga_power
Indicates if FPGA power consumption is within limits.
- LowCbfProcessor.hardware_hbm_temperature
Indicates if HBM (high bandwidth memory) temperature is within operating limits.
- LowCbfProcessor.hardware_power_supply_12v_voltage
Indicates if 12 V power rail voltage is within operating limits.
- LowCbfProcessor.hardware_power_supply_12v_current
Indicates if 12 V power rail current is within operating limits.
- LowCbfProcessor.hardware_pcie_12v_voltage
Indicates if PCIe bus 12 V power rail voltage is within operating limits.
- LowCbfProcessor.hardware_pcie_12v_current
Indicates if PCIe bus 12 V power rail current is within operating limits.
Process Health
- LowCbfProcessor.health_process
Indicates the device’s process health state.
Its value is the worst case scenario of constituent attributes (
process_*below)Contributes towards
healthStateattribute.
- LowCbfProcessor.process_delay_subscription_ok
Indicates correctness of arrival of delay polynomials
- LowCbfProcessor.process_delay_poly_valid
Indicates correctness of delay polynomial values
Contributes towards
health_processattribute.
- LowCbfProcessor.process_spead_packets_ok
Indicates SPS SPEAD packets are arriving at FPGA.
Contributes towards
health_processattribute.
Test Mode
If Test mode is active, the value of some attributes can be temporarily changed to any desired value for testing.
Continuous Integration
The current CI tests of this device use DeviceTestContext and do not require a full Tango system.
Environment Variables
Some runtime behaviours can be configured through environment variables.
INITIAL_ADMINMODEvariable can be used to select the default value ofadminModeTango attribute on startup.ALLOW_ADMIN_CHANGE_WHILE_USEDallowsadminModeto be altered at any time if set to true. If false (the default),adminModeis unable to be changed unless the FPGA is not in use by any subarray.FPGA_XRT_TIMEOUTallows the timeout (milliseconds) for each FPGA register read or write to be extended (default 5 if not set)FPGA_POST_FW_LOAD_DELAYspecifies time in seconds after firmware load before register reads will time out. (Allows for busy server CPU after loading a FPGA)CACHE_DIRallows firmware to be cached on-server, saving download bandwidth. Our Helm chart sets this to /app, meaning downloads will be cached in the pod’s ephemeral storage. For best results, override this to something persistent & shared between pods (e.g. a volume mount on the FPGA host server).STN_DELAY_SIGNcan be set “pos” or “neg” to change to allow station delay polynomials to be applied with positive or negative signPST_DELAY_SIGNcan be set “pos” or “neg” to change to allow PST beam delay polynomials to be applied with positive or negative signPSS_DELAY_SIGNcan be set “pos” or “neg” to change to allow PSS beam delay polynomials to be applied with positive or negative signPST_DELAY_FORMATcan be set “diff” or “full” to describe whether PST delay polynomials are supplied as differences from station beam, or as full delay polynomials independent of any reference to the station beam. In the latter case the cadence of station beam and PST beam polynomials must be identical.PSS_DELAY_FORMATcan be set “diff” or “full” to describe whether PSS delay polynomials are supplied as differences from station beam, or as full delay polynomials independent of any reference to the station beam. In the latter case the cadence of station beam and PSS beam polynomials must be identical.ISOLATEDallows processor to run without an allocator connection (software testing)CLEAR_PST_DELAYS_BEFORE_SCANSWhen defined (with any value), ensures that PST delay polynomials from the prior scan are NOT used as the initial delay polymomials for the next scan, even if they are still valid. This may be helpful if a Low.CBF subarray is configured then scanned multiple times, but between scans the beam pointing directions are changed by manipulating the delay polynomials provided externally to Low.CBF. Note that delay polynomials are always cleared for the first scan after a subarray has been reconfigured, and this setting does not override that behaviour.RIPPLE_COMPENSATIONwhen set to “18a” or “16d” applies a filter to compensate for passband ripple in coarse channel data from TPMs. Select an appropriate value for your TPMs (“16d” in PI26, but after TPM firmware is updated in PI27 use “18a”). Ripple compensation will be disabled if the environment variable is not present or has an unrecognised value. Log messages will confirm which filter is chosen.ALLOCATOR_RECHECK_SECSTime between checks for allocator restarts. Processor will re-register with allocator if check shows allocator has redeployed/restarted. Defaults to 30 seconds if the environment variable is not present or the value is invalid.JONES_UNITYIf present and contains a JSON string encoding 8 floating point numbers AND subarray is configured without kafka RCAL address, then instead of using the identity matrix for Jones corrections, the processor will use the encoded values. Values in the JSON string are ‘[Jones[1,1].re, Jones[1,1].im, Jones[1,2].re, Jones[1,2].im, Jones[2,1].re, Jones[2,1].im, Jones[2,2].re, Jones[2,2].im’]. For correct operation values matrix entries should be floating point with value between -1.0 and +1.0. (This env var is intended only to assist with testing)INIT_RESEND_SECSTime between periodic resend of spead-INIT packets for correlator, default 30.0 if not defined. Introduced for ADR111.INIT_SCANNING_FACTORInteger 1 or above, default 10 if undefined. For a value N, scanning subarrays send their periodic INITs at 1/Nth of the “INIT_RESEND_SECS” interval.INIT_RESEND_DISABLEIf defined, turns off periodic send of correlator INIT packetsNO_VIS_DATA_ZERO_SCANIDIf defined, turns off sending visibilities when not scanning Note this is NOT implemented
An example of the Helm chart keys that parent charts would need to override to achieve correct processor operation is given in the test-parent chart’s values.yaml file:
FPGA Environment Variables
These environment variables are used to inform the LowCbfProcessor software about
the Alveo FPGA card in use. They are automatically configured by the
processor-device.sh script on pod startup.
FPGA_BDF- PCIe BDF address, e.g. “86:00.0” or “0000:07:00.0”. Format varies depending on underlying driver in use.FPGA_DRIVER- low-level driver in use, either “AMI” or “XRT”.FPGA_TYPE- model of Alveo card in use, e.g. “u55c”.SERIAL_NUM- Alveo card serial number.
To access High Bandwidth Memory (HBM) on FPGA cards using the AMI driver (i.e. V80), additional environment variables must be set to provide paths to QDMA character devices:
FPGA_QDMA_0- read (c2h) device, e.g. “/dev/qdma86001-MM-0”FPGA_QDMA_1- write (h2c) device, e.g. “/dev/qdma86001-MM-1”
State diagram for FPGA-usage state machine in cor_state_machine.py
. . image:: ../diagrams/cor-fpga-usage-statemachine.png
- alt:
state diagram for Correlator FPGA usage (historica)
Historical record of FPGA usage state machine at February 2026