Monitoring & Logging Reference#
Log format standards, field reference, and service URLs.
SKA Log Message Format#
All processes running in containers must log to stdout.
Log messages must conform to this format for ingestion:
SKA-LOGMSG = VERSION "|" TIMESTAMP "|" SEVERITY "|" [THREAD-ID] "|" [FUNCTION] "|" [LINE-LOC] "|" [TAGS] "|" MESSAGE LF
Field definitions:
Field |
Required |
Description |
|---|---|---|
VERSION |
Yes |
Version of SKA log standard (1-2 digits, starts at 1) |
TIMESTAMP |
Yes |
ISO8601 timestamp in UTC (e.g., |
SEVERITY |
Yes |
Log level: DEBUG, INFO, WARNING, ERROR, or CRITICAL |
THREAD-ID |
No |
Thread identifier (e.g., “MainThread”, “Thread-1”) |
FUNCTION |
No |
Full namespace of function (e.g., |
LINE-LOC |
No |
Filename and line number (e.g., |
TAGS |
No |
Comma-separated key:value pairs (e.g., |
MESSAGE |
Yes |
UTF-8 encoded message content |
Format examples#
1|2019-12-31T23:12:37.526Z|INFO||testpackage.testmodule.TestDevice.test_fn|test.py#1|tango-device:my/dev/name| Regular information logged here
1|2019-12-31T23:45:42.328Z|DEBUG||testpackage.testmodule.TestDevice.test_fn|test.py#150|| x = 67, y = 24
1|2019-12-31T23:49:53.543Z|WARNING||testpackage.testmodule.TestDevice.test_fn|test.py#16|| z is unspecified, defaulting to 0!
1|2019-12-31T23:50:17.124Z|ERROR||testpackage.testmodule.TestDevice.test_fn|test.py#165|site:Element| Could not connect to database!
1|2019-12-31T23:51:23.036Z|CRITICAL||testpackage.testmodule.TestDevice.test_fn|test.py#16|| Invalid operation. Cannot continue.
Logging levels#
Map Python logging levels to RFC5424 (syslog):
Python |
RFC5424 |
Numerical Code |
|---|---|---|
DEBUG |
Debug |
7 |
INFO |
Informational |
6 |
WARNING |
Warning |
4 |
ERROR |
Error |
3 |
CRITICAL |
Critical |
2 |
Parsing strategies#
Split by delimiter:
log_line = "1|2019-12-31T23:50:17.124Z|ERROR||testpackage.testmodule.TestDevice.test_fn|test.py#165|site:Element| Could not connect to database!"
structured_log = log_line.split('|')
log_level = structured_log[2]
Regex with named capture:
^(?<version>\d+)[|](?<timestamp>[0-9TZ\-:.]+)[|](?<level>[\w\s]+)[|](?<thread>[\w-]*)[|](?<function>[\w\-.]*)[|](?<lineloc>[\w\s.#]*)[|](?<tags>[\w\:,-]*)[|](?<message>.*)$
Test at: https://rubular.com/r/e0njVOGCN59mtA
Log field reference#
Use these fields to filter logs in Kibana or Elasticsearch.
Kubernetes fields:
Field |
Description |
|---|---|
|
Kubernetes namespace |
|
Pod name |
|
StatefulSet name (useful for Device Servers) |
|
Container name |
|
Node name |
SKA infrastructure fields:
Field |
Description |
|---|---|
|
Datacentre (e.g., stfc-techops, mid-itf) |
|
Environment (e.g., production) |
|
Log source (syslog, journald, docker, podman, kubernetes) |
SKA CI/CD fields (prefix with kubernetes.labels. or kubernetes.namespace_labels.):
Field |
Description |
|---|---|
|
GitLab project ID |
|
GitLab project name |
|
Sanitised GitLab project path |
|
Author name |
|
Author GitLab ID |
|
SKA team (from People’s database) |
|
Commit SHA |
|
Branch name |
|
GitLab pipeline ID |
|
GitLab job ID |
|
GitLab job name |
|
Merge request ID (if applicable) |
|
GitLab environment tier |
|
Pipeline trigger source |
SKA custom log fields:
Field |
Description |
|---|---|
|
Log severity level |
|
Dynamic log message tags |
Service URLs#
Logging services:
Datacentre |
Kibana |
Elasticsearch |
|---|---|---|
stfc-techops (cicd) |
||
stfc-dp (cicd) |
||
aws-* |
||
mid-itf/low-itf |
||
mid-aa |
||
low-aa |
Central logging filter values:
Datacentre |
ska.datacentre |
ska.environment |
|---|---|---|
stfc-techops (cicd) |
stfc-techops |
production |
stfc-dp (cicd) |
stfc-dp |
production |
mid-itf |
mid-itf |
production |
low-itf |
low-itf |
production |
psi-mid |
psi-mid |
production |
digital-signal-psi |
digital-signal-psi |
production |
Prometheus Alert Manager (VPN required):