Alarm Handler
This notebook is designed to be used to demonstrate that SKA PST has warning and alarms added to the PST.BEAM TANGO device.
This notebook should be used in conjunction with an Elettra Alarm Handler deployment, the CSP.LMC or a simulator/notebook like perform-scan-external-cbf.ipynb, a correlator beam former (CBF) or the cbf-simulator.py notebook.
Deploy the latest version of ska-pst
Before running this notebook, launch the test-parent helm chart by running the following commands in a terminal with access to the same Kubernetes cluster as this notebook (e.g. psi-head in the Low PSI)
git clone --recursive git@gitlab.com:ska-telescope/pst/ska-pst.git
cd ska-pst
make k8s-install-chart KUBE_NAMESPACE=pst K8S_CHART=test-parent
make k8s-wait KUBE_NAMESPACE=pst KUBE_APP=ska-pst
When finished running this demo, please remember to
make k8s-uninstall-chart KUBE_NAMESPACE=pst K8S_CHART=ska-pst
Setup imports for notebook
[1]:
import enum
import logging
import os
import sys
import tango
from tango import DeviceProxy
Set up logging
This will ensure any of the utility classes will log to cell outputs. IPython defaults to logging to stderr but the cells need to stdout. If we didn’t do this we would need to put print statements in the utility classes which is not a good development practice.
[2]:
# override format here for more or less logging information
# also update the logging level for different level of logging verbosity
logging.basicConfig(
format="%(asctime)s | %(levelname)s : %(message)s",
level=logging.INFO,
stream=sys.stdout,
)
logger = logging.getLogger()
Set the TANGO_HOST environment variable
If the TANGO_HOST environment variable is already set to something other than the default, then the following code assumes that it has been set correctly (e.g. in the environment variables of the image running notebook-test) and the value is not modified.
Otherwise, the following code sets TANGO_HOST to the Tango database server in the pst namespace.
If a different namespace was used to deploy the test-parent chart, then set the kube_namespace variable accordingly.
Using notebook agasint a k8s cluster
If using this Notebook against a k8s cluster, like minikube, that you have admin access
$ kubectl get -n <namespace> svc
This should output something like, find the EXTERNAL-IP for the databaseds-tango-base-test service.
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
databaseds-tango-base-test LoadBalancer 10.109.225.112 192.168.49.97 10000:30553/TCP 50m
Ensure that you can reach the external IP and port. On a Linux environment you should be able to do:
nc -v <external-ip> 10000
Using notebook in Docker Desktop on Windows
If this notebook is run in a container using Docker Desktop on Windows, then uncomment the line:
os.environ["TANGO_HOST"] = "host.docker.internal:10000"
[3]:
default_tango_host = "tango-databaseds.staging:10000"
tango_host = os.environ.get("TANGO_HOST", default_tango_host)
kube_namespace = "test-parent"
if tango_host in [default_tango_host, ""]:
tango_host = f"databaseds-tango-base-test.{kube_namespace}:10000"
os.environ["TANGO_HOST"] = tango_host
# If running the ska-pst-jupyterlab pod within the same k8s namespace then uncomment this line
os.environ["TANGO_HOST"] = "databaseds-tango-base-test:10000"
# If using k8s and got the "EXTERNAL-IP" of databaseds-tango-base-test, uncomment this and set the IP
# os.environ["TANGO_HOST"] = "192.168.49.98:10000"
# uncomment this line if the notebook is running in a container hosted by Docker Desktop
# os.environ["TANGO_HOST"] = "host.docker.internal:10000"
logger.info(f"TANGO_HOST={os.environ['TANGO_HOST']}")
2024-10-07 22:01:59,567 | INFO : TANGO_HOST=databaseds-tango-base-test:10000
Get a TANGO Device Proxyto the current PST Beam device.
[4]:
beam = DeviceProxy("low-pst/beam/01")
Get a TANGO Device Proxyto the Eletrra Alarm handler.
[5]:
ah = DeviceProxy("alarm/handler/01")
ah.eventSummary
[5]:
()
Set up alarm handler rules.
This bit of code will check the PST BEAM TANGO device attributes to find any that have any warning or alarm level that have been set. It will then use that information to generate a list of alarm rules to be applied to the Elettra Alarm Handler
[6]:
class _AlarmLevel(enum.IntEnum):
MIN_ALARM = 0
MIN_WARNING = 1
MAX_WARNING = 2
MAX_ALARM = 3
def _create_rule(attr: tango.AttributeInfoEx, alarm_level: _AlarmLevel) -> None:
attr_alarms = attr.alarms
attr_fqdn = f"{beam.name()}/{attr.name}".lower()
alarm_attr_str = attr_fqdn.replace("/", "_").replace("-", "")
if alarm_level == _AlarmLevel.MIN_ALARM:
alarm_value = attr_alarms.min_alarm
prefix = "alarm"
suffix = "min"
rule = f"{attr_fqdn} <= {alarm_value} || {attr_fqdn}.quality == ATTR_ALARM"
message = f"{attr.name} is too low"
elif alarm_level == _AlarmLevel.MIN_WARNING:
alarm_value = attr_alarms.min_warning
prefix = "warning"
suffix = "min"
rule = f"{attr_fqdn} <= {alarm_value} || {attr_fqdn}.quality == ATTR_WARNING"
message = f"{attr.name} is getting too low"
elif alarm_level == _AlarmLevel.MAX_WARNING:
alarm_value = attr_alarms.max_warning
prefix = "warning"
suffix = "max"
rule = f"{attr_fqdn} >= {alarm_value} || {attr_fqdn}.quality == ATTR_WARNING"
message = f"{attr.name} is getting too high"
else:
assert alarm_level == _AlarmLevel.MAX_ALARM
alarm_value = attr_alarms.max_alarm
prefix = "alarm"
suffix = "max"
rule = f"{attr_fqdn} >= {alarm_value} || {attr_fqdn}.quality == ATTR_WARNING"
message = f"{attr.name} is getting too high"
if alarm_value == "Not specified":
return
rule_str = f"tag={prefix}_{alarm_attr_str}_{suffix}; formula=({rule}); priority=log; message={message}"
try:
ah.Load(rule_str)
except Exception:
ah.Modify(rule_str)
for attr in beam.attribute_list_query_ex():
for alarm_level in _AlarmLevel:
_create_rule(attr, alarm_level)
Check current event summary on the Alarm Handler
[7]:
ah.eventSummary
[7]:
('event=low-pst/beam/01/availablerecordingtime;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/datadroprate;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/misorderedpacketrate;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/malformedpacketrate;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/misdirectedpacketrate;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/checksumfailurepacketrate;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/timestampsyncerrorpacketrate;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/seqnumbersyncerrorpacketrate;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/novalidpolarisationcorrectionpacketrate;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/novalidstationbeampacketrate;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/novalidpstbeampacketrate;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/ringbufferutilisation;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/realpolavariancefreqavg;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/realpolanumclippedsamples;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/imagpolavariancefreqavg;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/imagpolanumclippedsamples;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/realpolavariancefreqavgrfiexcised;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/realpolanumclippedsamplesrfiexcised;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/imagpolavariancefreqavgrfiexcised;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/imagpolanumclippedsamplesrfiexcised;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/realpolbvariancefreqavg;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/realpolbnumclippedsamples;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/imagpolbvariancefreqavg;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/imagpolbnumclippedsamples;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/realpolbvariancefreqavgrfiexcised;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/realpolbnumclippedsamplesrfiexcised;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/imagpolbvariancefreqavgrfiexcised;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;',
'event=low-pst/beam/01/imagpolbnumclippedsamplesrfiexcised;time=2024-10-07 22:02:02;values=[];exception={"Reason":"NOT_connected","Desc":"Attribute not subscribed","Origin":"..."};quality=ATTR_INVALID;')
The following image is the alarms that had been added in a testing environment after applying the above code. 
Perform scans to generate alarms
The following subsections explain how to test the different rules.
Signal Statistics Warnings
The following warnings are statistical warnings on the input data coming from the CBF.
realPolAVarianceFreqAvg
realPolANumClippedSamples
imagPolAVarianceFreqAvg
imagPolANumClippedSamples
realPolAVarianceFreqAvgRfiExcised
realPolANumClippedSamplesRfiExcised
imagPolAVarianceFreqAvgRfiExcised
imagPolANumClippedSamplesRfiExcised
realPolBVarianceFreqAvg
realPolBNumClippedSamples
imagPolBVarianceFreqAvg
imagPolBNumClippedSamples
realPolBVarianceFreqAvgRfiExcised
realPolBNumClippedSamplesRfiExcised
imagPolBVarianceFreqAvgRfiExcised
imagPolBNumClippedSamplesRfiExcised
For the FreqAvg attributes, the easiest way to cause the alarm to happen is to ensure that the data sent from the CBF is just zeros. In the CBF Simulator notebook, this can be performed by using the Gaussian Noise generator with a standard deviation of 0.0.
For the ClippedSamples attributes, the easiest way to cause the alarm to happen is to ensure that data has a large variance/standard deviation such that clipping occurs. For NBIT=16, as standard deviation of ~11,000 would ensure that some clipping would occur, for systems where NBIT=8 a value of ~45 would suffice (these calculations are from \(2^{nbit-1} / 3\)
Available Recording Time Warning & Alarm
These alarms occur when the available disk space on the local filesystem (LFS) in the digital signal processing (DSP) application is less that \(t * bytes\_per\sec\) for a given scan, where \(t\) is the time left for the scan.
Simulating this requires filling up the LFS within the pod, this can be done by ensure the SEND pod stops cleaning up files. The default test parent deployment of PST provides only 10GB of file storage each for the LFS and the shared disk between the SDP data product dashboard. This can be filled up relatively quickly by relatively long scan of around 10 minutes using the CBF simulator notebook.
When performing such a simulation one should clean up the LFS snd SDP data mounts within in the SEND application
Use Kubectl get into the pod
kubectl exec -it ska-pst-core-send -- bash
Remove files from the LFS and DLM mounts:
rm -fr /mnt/pst/data/product/eb-*
rm -fr /mnt/pst/data/staging/eb-*
rm -fr /mnt/pst/dlm/product/eb-*
Ring Buffer Utilisation Warning and Alarm
Testing of alarm is very hard to perform because if everything is going correctly it the DSP pipeline would be reading data from the ring buffer fast enough not to cause the ring buffer to get full. There is no way to stop the DSP processing from performing the task.
UDP Packet errors and validity alarms
The following attributes can be tested by using the CBF Simulator and enabling the Induce Errors and setting the percentage of packets that would have the errors. These could be tested individually or in parallel.
dataDropRate
misorderedPacketRate
malformedPacketRate
misdirectedPacketRate
checksumFailurePacketRate
timestampSyncErrorPacketRate
seqNumberSyncErrorPacketRate
noValidPolarisationCorrectionPacketRate
noValidStationBeamPacketRate
noValidPstBeamPacketRate
[ ]: