Power management in MCCS

Hardware power modes

SKA hardware may support up to three power modes:

  • ON: the hardware is powered on and fully operational. This mode is supported by all SKA hardware.

  • OFF: the hardware is powered off. Generally we would expect all hardware to be able to be turned off. There may be special cases, however, where this is not supported. For example, an externally managed cluster can be turned off, but the MCCS interface to it might only allow for submission and monitoring of jobs. Thus it cannot be turned off from the MCCS point of view.

  • STANDBY: the hardware is in a low-power standby mode. Such a mode is important in two cases:

    • where powering up a subsystem with many devices, it is important to limit the inrush current. This is achieved by powering up devices into a standby mode that uses no more than 5% of their nominal power; then carefully orchestrating transitions to full power.

    • where powering up a device from off could take a long time (perhaps several minutes). Such devices may instead be powered up into standby mode, in which power consumption is low, but the time to fully power on the hardware is short (a couple of seconds).

    Standby mode is not supported by all hardware; indeed there may be very few hardware devices that support it.

Power mode breakdown

Generally speaking, one cannot tell a hardware device to turn itself off; for once it is off, it loses the ability to turn itself on again. Instead, power to a device is controlled by some upstream device. For example, power to a TPM is controlled by the subrack in which that TPM is installed. Standby mode is generally not managed by an individual devices, but by an ensambe of devices (SpsStation, FieldStation) by selectively turning off the most power comsumpting devices. Thus, implementation of the three power modes breaks down into:

  • OFF: tell the upstream device (e.g. subrack) to deny power to the device (e.g. TPM)

  • STANDBY: selectively turn ON some sub-devices, and turn OFF or leave OFF the TPMs (in the SPS cabinet) and he antennas and Front End modules (in the Field station)

  • ON: tell the upstream device to supply power to the device, then tell the device itself to go fully operational

Power flow

Map

The activity diagram below shows the flow of power through the MCCS system; i.e. cabling, essentially. The (/) points are switch points at which the power can be turned on/off. These switch points are annotated with the Tango device commands that drive the switch (where impemented).

Note: this diagram will evolve over time.

@startuml
|Power supply|
start
split
-> 1..4;
|MCCS Cabinet|
:MCCS
cabinet
PDU;
split
:/;
:1 Gb switch;
detach
split again
:/;
-> 1..2;
:100 Gb switch;
detach
split again
:/;
-> 1..17-18;
:MCCS node;
detach
end split
split again
|SPS Cabinet|
:/;
-> 1..256;
:SPS
cabinet
PDU;
split
:/;
note right
MccsSubrack.On()
end note
-> 1..4;
:SPS
subrack;
:/;
note right
MccsSubrack.PowerOnTpm()
end note
-> 1..8;
:TPM;
detach
split again
:/;
-> 1;
:1G
switch;
detach
split again
:/;
-> 1..2;
:40G
switch;
detach
split again
:/;
-> 1;
:White
rabbit;
end split
detach
split again
|Field equipment|
:/;
-> 1..256;
:MccsFNDH;
:/;
note right
MccsSmartBox.On(n)
end note
-> 1..24;
:NccsSmartBox;
:/;
note right
MccsFieldStation.PowerOnAntenna()
end note
-> 1..12;
:Antenna;
detach
end split
@enduml

Startup sequence

Boot-up

When power is first applied to MCCS, the following minimal bootup sequence is followed:

  1. Power is applied to all cabinets. All the cabinet management boards come on, as they are the primary control points for the cabinet subsystems. Switches and subelements for the SPS cabinets are configured to remain off, as are the subelements for all but one of the MCCS cabinets.

  2. Power is applied to the cwFrontNode Data Hubs (FNDH) in the field nodes. All the smartboxes and antennas are configured to remain off.

  3. The cabinet Power Distribution Unit (PDU) for the MCCS cabinet that houses the MCCS controller node is configured to start up the cabinet’s 1Gb network switch and the MCCS controller node.

  4. The 1Gb network switch powers up

  5. The MCCS controller node boots up.

  6. The kubernetes cluster is started.

  7. A minimum chart is deployed, containing just the tango subsystem and the MCCS Controller Tango device.

Power-on

When TM sends the MCCS Controller the Startup command, the MCCS Controller must start up:

  1. the rest of MCCS

  2. the SPS subracks

  3. the SPS TPMs

  4. the field equipment

A possible implementation of the startup sequence, limited to the SPS and Field Station equipment, is shown in the sequence diagram below. Elements in red are not yet implemented. In this example the power up operation is performed in two stages. First the system is placed in STANDBY mode, which can occur in a short time. Then individual elements are progressively turned ON. Power ON is staggered in time to avoid excessive power ramping. Finally, the whole array is synchronised and acquisition is started.

@startuml
participant "Power\nsupply\nto MCCS" as Supply
participant "Telescope\nManager\n(TM)" as TM
participant Controller
participant MccsStation
box "SPS Cabinet"
participant SpsStation
participant "SPS Cabinet\nPDU" as Cabinet #pink
participant "White\nRabbit" as WR #pink
participant "Switch" as Switch #pink
participant "SPS Subrack" as Subrack
participant "TPM" as TPM
end box
box "Field node"
participant "Field station" as FN
participant "FNDH"  as pasd
participant "Smartbox"
participant "Antenna" as Antenna
end box
Supply --> Cabinet: <power>
Supply --> pasd: <power>
TM --> Controller: StartUp()
Controller -> MccsStation: Standby()
MccsStation -> FN: Standby()
FN -> pasd: On()
pasd --> Smartbox: <power>
hnote over pasd: ON
FN -> Smartbox: On()
hnote over Smartbox: ON
MccsStation -> SpsStation: Standby()
SpsStation -> Cabinet: TurnOn(switch)
Cabinet --> Switch: <power>
SpsStation -> Cabinet: TurnOn(WR)
Cabinet --> WR: <power>
hnote over FN: STANDBY
FN --> MccsStation: PowerState.STANDBY
SpsStation -> Subrack: On()
Subrack -> Cabinet: TurnOn(subrack)
Cabinet --> Subrack: <power>
hnote over Subrack: ON
Subrack --> SpsStation: PowerState.ON
hnote over SpsStation: STANDBY
SpsStation --> MccsStation: PowerState.STANDBY
hnote over MccsStation: STANDBY
Controller -> MccsStation: On()
MccsStation -> FN: On()
FN -> Smartbox: TurnAntennaOn()
Smartbox --> Antenna: <power>
hnote over FN: ON
MccsStation-> SpsStation: On()
FN --> MccsStation: PowerState.ON
SpsStation -> TPM: On()
TPM -> Subrack: TurnOnTpm()
Subrack --> TPM: <power>
TPM -> TPM: Initialise()
hnote over TPM: ON
TPM --> SpsStation: PowerState.ON
SpsStation -> TPM: InitialiseStation()
hnote over SpsStation: ON
MccsStation <-- SpsStation: PowerState.ON
hnote over MccsStation: ON
Controller -> MccsStation: StartAcquisition()
MccsStation -> SpsStation: StartAcquisition()
SpsStation -> TPM: StartAcquisition()
hnote over TPM: Synchronised
@enduml

Prototype status

In the current prototype implementation, all of MCCS is deployed immediately on startup, so that when TM sends the MCCS Controller the Startup command, it need only start up the SPS cabinets and field equipment.