Power management in MCCS

Hardware power modes

SKA hardware may support up to three power modes:

  • ON: the hardware is powered on and fully operational. This mode is supported by all SKA hardware.

  • OFF: the hardware is powered off. Generally we would expect all hardware to be able to be turned off. There may be special cases, however, where this is not supported. For example, an externally managed cluster can be turned off, but the MCCS interface to it might only allow for submission and monitoring of jobs. Thus it cannot be turned off from the MCCS point of view.

  • STANDBY: the hardware is in a low-power standby mode. Such a mode is important in two cases:

    • where powering up a subsystem with many devices, it is important to limit the inrush current. This is achieved by powering up devices into a standby mode that uses no more than 5% of their nominal power; then carefully orchestrating transitions to full power.

    • where powering up a device from off could take a long time (perhaps several minutes). Such devices may instead be powered up into standby mode, in which power consumption is low, but the time to fully power on the hardware is short (a couple of seconds).

    Standby mode is not supported by all hardware; indeed there may be very few hardware devices that support it.

Power mode breakdown

Generally speaking, one cannot tell a hardware device to turn itself off; for once it is off, it loses the ability to turn itself on again. Instead, power to a device is controlled by some upstream device. For example, power to a TPM is controlled by the subrack in which that TPM is installed. Standby mode, however, is controlled by the device itself. Thus, implementation of the three power modes breaks down into:

  • OFF: tell the upstream device (e.g. subrack) to deny power to the device (e.g. TPM)

  • STANDBY: tell the upstream device to supply power to the device, then tell the device itself to go into standby mode

  • ON: tell the upstream device to supply power to the device, then tell the device itself to go fully operational

Power flow

Map

The activity diagram below shows the flow of power through the MCCS system; i.e. cabling, essentially. The (/) points are switch points at which the power can be turned on/off. These switch points are annotated with the Tango device commands that drive the switch.

Note: this diagram will evolve over time.

|Power supply|
start
split
-> 1..4;
|MCCS Cabinet|
:MCCS
cabinet
management
board;
split
:/;
:1 Gb switch;
detach
split again
:/;
-> 1..2;
:100 Gb switch;
detach
split again
:/;
-> 1..17-18;
:MCCS node;
detach
end split
split again
|SPS Cabinet|
-> 1..256;
:SPS
cabinet
management
board;
split
:/;
note right
MccsCabinet.TurnSwitchOn()
end note
-> 1..2;
:switch;
detach
split again
:/;
note right
MccsCabinet.TurnSubrackOn()
end note
-> 1..4;
:SPS
subrack
management
board;
:/;
note right
MccsSubrack.TurnTPMOn()
end note
-> 1..8;
:TPM;
end split
detach
split again
|Field equipment|
-> 1..256;
:/;
note right
MccsAPIU.On()
end note
-> 1;
:APIU;
:/;
note right
MccsAPIU.TurnAntennaOn(n)
end note
-> 1..64;
:Antenna;
detach
end split

Startup sequence

Boot-up

When power is first applied to MCCS, the following minimal bootup sequence is followed:

  1. Power is applied to all cabinets. All the cabinet management boards come on, as they are the primary control points for the cabinet subsystems. Switches and subelements for the SPS cabinets are configured to remain off, as are the subelements for all but one of the MCCS cabinets.

  2. Power is applied to the APIUs in the field nodes. All the antennas are configured to remain off.

  3. The cabinet management board for the MCCS cabinet that houses the MCCS controller node is configured to start up the cabinet’s 1Gb network switch and the MCCS controller node.

  4. The 1Gb network switch powers up

  5. The MCCS controller node boots up.

  6. The kubernetes cluster is started.

  7. A minimum chart is deployed, containing just the tango subsystem and the MCCS Controller Tango device.

Power-on

When TM sends the MCCS Controller the Startup command, the MCCS Controller must start up:

  1. the rest of MCCS

  2. the SPS subrack management boards and switches

  3. the SPS TPMs

  4. the field equipment

Prototype status

In the current prototype implementation, all of MCCS is deployed immediately on startup, so that when TM sends the MCCS Controller the Startup command, it need only start up the SPS cabinets and field equipment.

participant "Power\nsupply\nto MCCS" as Supply
participant "Telescope\nManager\n(TM)" as TM
participant Controller
participant "SPS Cabinet\nControl\nBoard <c>" as Cabinet
participant "Switch <s>" as Switch
participant "SPS Subrack\nControl\nBoard <r>" as Subrack
participant "TPM <t>" as TPM
participant "APIU <p>" as APIU
participant "Antenna <a>" as Antenna
Supply --> Cabinet: <power>
TM --> Controller: StartUp()
Controller -> Cabinet: On()
Cabinet --> Switch: <power>
Cabinet -> Switch: On()
Cabinet --> Subrack: <power>
Cabinet -> Subrack: On()
Subrack --> TPM: <power>
Subrack -> TPM: On()
Controller -> APIU: On()
APIU --> Antenna: <power>