Monitoring & Logging Reference#

Log format standards, field reference, and service URLs.

SKA Log Message Format#

All processes running in containers must log to stdout.

Log messages must conform to this format for ingestion:

SKA-LOGMSG = VERSION "|" TIMESTAMP "|" SEVERITY "|" [THREAD-ID] "|" [FUNCTION] "|" [LINE-LOC] "|" [TAGS] "|" MESSAGE LF

Field definitions:

Field

Required

Description

VERSION

Yes

Version of SKA log standard (1-2 digits, starts at 1)

TIMESTAMP

Yes

ISO8601 timestamp in UTC (e.g., 2019-12-31T23:12:37.526Z)

SEVERITY

Yes

Log level: DEBUG, INFO, WARNING, ERROR, or CRITICAL

THREAD-ID

No

Thread identifier (e.g., “MainThread”, “Thread-1”)

FUNCTION

No

Full namespace of function (e.g., package.module.Class.method)

LINE-LOC

No

Filename and line number (e.g., test.py#150)

TAGS

No

Comma-separated key:value pairs (e.g., tango-device:my/dev/name)

MESSAGE

Yes

UTF-8 encoded message content

Format examples#

1|2019-12-31T23:12:37.526Z|INFO||testpackage.testmodule.TestDevice.test_fn|test.py#1|tango-device:my/dev/name| Regular information logged here
1|2019-12-31T23:45:42.328Z|DEBUG||testpackage.testmodule.TestDevice.test_fn|test.py#150|| x = 67, y = 24
1|2019-12-31T23:49:53.543Z|WARNING||testpackage.testmodule.TestDevice.test_fn|test.py#16|| z is unspecified, defaulting to 0!
1|2019-12-31T23:50:17.124Z|ERROR||testpackage.testmodule.TestDevice.test_fn|test.py#165|site:Element| Could not connect to database!
1|2019-12-31T23:51:23.036Z|CRITICAL||testpackage.testmodule.TestDevice.test_fn|test.py#16|| Invalid operation. Cannot continue.

Logging levels#

Map Python logging levels to RFC5424 (syslog):

Python

RFC5424

Numerical Code

DEBUG

Debug

7

INFO

Informational

6

WARNING

Warning

4

ERROR

Error

3

CRITICAL

Critical

2

Parsing strategies#

Split by delimiter:

log_line = "1|2019-12-31T23:50:17.124Z|ERROR||testpackage.testmodule.TestDevice.test_fn|test.py#165|site:Element| Could not connect to database!"
structured_log = log_line.split('|')
log_level = structured_log[2]

Regex with named capture:

^(?<version>\d+)[|](?<timestamp>[0-9TZ\-:.]+)[|](?<level>[\w\s]+)[|](?<thread>[\w-]*)[|](?<function>[\w\-.]*)[|](?<lineloc>[\w\s.#]*)[|](?<tags>[\w\:,-]*)[|](?<message>.*)$

Test at: https://rubular.com/r/e0njVOGCN59mtA

Log field reference#

Use these fields to filter logs in Kibana or Elasticsearch.

Kubernetes fields:

Field

Description

kubernetes.namespace

Kubernetes namespace

kubernetes.pod.name

Pod name

kubernetes.statefulset.name

StatefulSet name (useful for Device Servers)

kubernetes.container.name

Container name

kubernetes.node.name

Node name

SKA infrastructure fields:

Field

Description

ska.datacentre

Datacentre (e.g., stfc-techops, mid-itf)

ska.environment

Environment (e.g., production)

ska.application

Log source (syslog, journald, docker, podman, kubernetes)

SKA CI/CD fields (prefix with kubernetes.labels. or kubernetes.namespace_labels.):

Field

Description

cicd_skao_int/projectId

GitLab project ID

cicd_skao_int/project

GitLab project name

cicd_skao_int/projectPath

Sanitised GitLab project path

cicd_skao_int/author

Author name

cicd_skao_int/authorId

Author GitLab ID

cicd_skao_int/team

SKA team (from People’s database)

cicd_skao_int/commit

Commit SHA

cicd_skao_int/branch

Branch name

cicd_skao_int/pipelineId

GitLab pipeline ID

cicd_skao_int/jobId

GitLab job ID

cicd_skao_int/job

GitLab job name

cicd_skao_int/mrId

Merge request ID (if applicable)

cicd_skao_int/environmentTier

GitLab environment tier

cicd_skao_int/pipelineSource

Pipeline trigger source

SKA custom log fields:

Field

Description

ska_severity

Log severity level

ska_tags_field.<tag>

Dynamic log message tags

Standard tags#

Use these tags in log messages for filtering:

deviceName:

TANGO device name in format <facility>/<family>/<device>.

Example: MID-D0125/rx/controller

  • MID-D0125 — Dish serial number

  • rx — Dish Single Pixel Feed Receiver (SPFRx)

  • controller — Dish SPFRx controller

subSystem:

For non-TANGO software, the telescope sub-system name.

Valid values: CSP, Dish, INAU, INSA, LFAA, SDP, SaDT, TM

Service URLs#

Logging services:

Central logging filter values:

Datacentre

ska.datacentre

ska.environment

stfc-techops (cicd)

stfc-techops

production

stfc-dp (cicd)

stfc-dp

production

mid-itf

mid-itf

production

low-itf

low-itf

production

psi-mid

psi-mid

production

digital-signal-psi

digital-signal-psi

production

Prometheus Alert Manager (VPN required):

External documentation#