Logging Solution

Logging is one of the components in a “Developer centered” tooling set, facilitating the analysis of infrastructure and application behaviour. Please refer to the centralised monitoring and logging to understand what solutions are available and how they integrate.

Logging in SKA is handled with Elasticsearch, bundled with Kibana as a frontend. This frontend is more suitable to creating visualisations than actually searching logs.

Note

Examples refer and redirect to central logging URLs. If your datancentre is a production one (ie, low-aa, mid-aa, etc) please refer to the list of URLs and use that URL instead

Set up Kibana/Elasticsearch access

If you simply want to navigate in Kibana, you can go to Kibana and Continue as Guest. This enables you to see logs and dashboards, but every other operation is limited.

If you need to access logs locally or have other types of permissions, we advise you to create your own account. To do so, follow:

  1. Create an STS ticket

  2. Log in with your:
    • Username: “Username” in your JIRA Profile

    • Password: Password provided by ST in response to your STS ticket

  3. Create API Key
    • On the left pane, navigate to Stack Management -> API keys

    Create API Key in Kibana

    Create API Key in Kibana

  4. Test API Key

    $ curl -k -H "Authorization: ApiKey <your API Key>" https://logging.stfc.skao.int:9200/_cat/health
    
  5. Keep your password and API Key safe

Filter logs in Kibana

Kibana is not the best tool to work with logs, as its strength is more on visualisations based on data present in logs. Nonetheless, we search logs in Kibana in the same manner we search in Elasticsearch, using log metadata fields.

Discover logs

In Kibana we search logs in the Discover view, selectable on the left pane. There we can operate some top-level controls:

  1. Data view - Change which data view to use (comprising of one or more index patterns)

  2. Date selection - Select start and end time to look for logs

  3. Filters - Filter logs based on fields, allowing for and/or operations with is/is one of/exists operators and their negative counterparts

  4. Search Bar - Kibana query language (KQL) expressions to filter logs. This gets and’ed with the filters

  5. Field List - List of fields existing in the selected logs. We can search field names (_*_) and select them (using ‘+’) to be displayed Document view

  6. Document - List of documents matching the search criteria. You sort and further filter by a field’s value (using ‘+’) of interest or exclude similar results (using ‘-‘)

To know more about it, please refer to the official documentation.

Kibana discover view

Kibana discover view

This is a useful page to understand and quickly search what logs and fields are available, usually through auto-complete. Another useful feature is expanding a document where we can see all the metadata and fields available. As an example, lets look for a Device server log in the staging-ska-tango-examples namespace in the STFC CICD (stfc-techops datacentre) cluster. Later in this page we will go over some of the built-in and custom fields you can filter with:

Kibana filter

Kibana filter

Although you can do everything with KQL, we suggest you do most of your filtering with filters and only use KQL to search for string values that are not exact matches (_i.e._, lines containing a substring). As we use Statefulsets to run Device Servers, we can try to find some fields to help us filter logs. For each filter, we get a very handy list of most occurring values (within the selected timeframe):

Kibana field values

Kibana field values

We can include (using ‘+’) or exclude (using ‘-’) other documents where this field has an equal value. Inspecting a document’s JSON content, we can see the whole set of fields present. Note that, not every document has all of this fields, but mostly all documents related to Kubernetes logs do:

{
  "_index": ".ds-filebeat-8.17.4-2025.05.09-006885",
  "_id": "ayiMtZYB3UmOeKG2FrhB",
  "_version": 1,
  "_source": {
    "container": {
      "image": {
        "name": "artefact.skao.int/ska-tango-images-tango-db:11.0.2"
      },
      "runtime": "containerd",
      "id": "38ebd034e936c4df7368442be632c2455e2f3c5f92da0f290a8c0cd18394d206"
    },
    "input": {
      "type": "container"
    },
    "kubernetes": {
      "pod": {
        "uid": "cdefc1fe-7a2b-4bba-b232-2bd421797308",
        "ip": "10.10.192.42",
        "name": "databaseds-tangodb-tango-databaseds-0"
      },
      "statefulset": {
        "name": "databaseds-tangodb-tango-databaseds"
      },
      "namespace": "staging-ska-tango-examples",
      "namespace_uid": "9e932d52-5a61-45d7-9a03-579bdb874204",
      "namespace_labels": {
        "cicd_skao_int/project": "ska-tango-examples",
        "cicd_skao_int/author": "matteo1981",
        "cicd_skao_int/jobId": "9974477502",
        "cicd_skao_int/projectPath": "ska-telescope-ska-tango-examples",
        "kubernetes_io/metadata_name": "staging-ska-tango-examples",
        "cicd_skao_int/pipelineId": "1805215409",
        "cicd_skao_int/mrId": "",
        "cicd_skao_int/projectId": "9673989",
        "cicd_skao_int/job": "deploy-staging",
        "cicd_skao_int/environmentTier": "staging",
        "cicd_skao_int/branch": "master",
        "cicd_skao_int/authorId": "3003086",
        "cicd_skao_int/team": "system",
        "cicd_skao_int/commit": "7cabaa1f4d5a697e89e87980dfb359fe375105b9",
        "cicd_skao_int/pipelineSource": "web"
      },
      "labels": {
        "cicd_skao_int/project": "ska-tango-examples",
        "cicd_skao_int/author": "matteo1981",
        "cicd_skao_int/jobId": "9960905992",
        "apps_kubernetes_io/pod-index": "0",
        "controller-revision-hash": "databaseds-tangodb-tango-databaseds-5555d574fd",
        "app_kubernetes_io/managed-by": "DatabaseDSController",
        "cicd_skao_int/pipelineId": "1805215409",
        "cicd_skao_int/projectId": "9673989",
        "app_kubernetes_io/name": "databaseds-tangodb",
        "cicd_skao_int/job": "deploy-staging",
        "app_kubernetes_io/instance": "tango-databaseds",
        "cicd_skao_int/branch": "master",
        "cicd_skao_int/team": "system",
        "cicd_skao_int/authorId": "3003086",
        "cicd_skao_int/commit": "7cabaa1f4d5a697e89e87980dfb359fe375105b9",
        "statefulset_kubernetes_io/pod-name": "databaseds-tangodb-tango-databaseds-0"
      }
    },
    "stream": "stderr",
    "host": {
      "name": "stfc-techops-production-cicd-md-0-2nwqg-gd5g8"
    },
    "ska": {
      "datacentre": "stfc-techops",
      "environment": "production",
      "prometheus_datacentre": "stfc-ska-monitor",
      "application": "kubernetes",
      "service": "clusterapi"
    },
    "message": "2025-05-09 14:56:16 58199 [Warning] Aborted connection 58199 to db: 'unconnected' user: 'unauthenticated' host: '10.100.1.251' (This connection closed     normally without authentication)"
  }
}

Note

Some data was omitted from the JSON document for brevity

You can use any combination of these fields to filter your logs and pin-point the timing you are looking for. The timeline in Kibana is particularly useful for that, as we can see an aggregated count of documents which can help us narrow down a time window. For an effective log search we need to know which fields we can use. To drill that down we suggest:

  • Kubernetes fields, of which we highlight:

    • kubernetes.namespace -> Kubernetes namespace

    • kubernetes.pod.name -> Kubernetes Pod name

    • kubernetes.statefulset.name -> Kubernetes Statefulset name, useful for Device Servers

    • kubernetes.node.name -> Kubernetes Nonethelessode name

  • SKA infrastructure fields:

    • ska.datacentre -> Datacentre the log came from

    • ska.environment -> Environment in the datacentre the log came from

    • ska.application -> The log source (one of syslog, journald, docker, podman or kubernetes)

  • SKA standard fields (not in widespread use, unfortunately):

    • kubernetes.labels.domain -> Application domain

    • kubernetes.labels.function -> Application function

    • kubernetes.labels.system -> System application is part of

    • kubernetes.labels.subsystem -> Subsystem application is part of

    • kubernetes.labels.telescope -> Telescope application is part of

  • SKA CICD fields (prefixed with kubernetes.labels or kubernetes.namespace_labels):

    • cicd_skao_int/projectId -> Gitlab project id

    • cicd_skao_int/project -> Gitlab project name

    • cicd_skao_int/projectPath -> Sanitized Gitlab project path (“/” replaced with “-“)

    • cicd_skao_int/authorId -> Author Gitlab id

    • cicd_skao_int/author -> Author name

    • cicd_skao_int/team -> Author’s SKA Team (might not always be available, depends on the contents of the People’s database)

    • cicd_skao_int/commit -> Commit

    • cicd_skao_int/branch -> Branch

    • cicd_skao_int/pipelineId -> Gitlab pipeline id

    • cicd_skao_int/jobId -> Gitlab job id

    • cicd_skao_int/job -> Gitlab job name

    • cicd_skao_int/mrId -> Gitlab merge request id (if applicable)

    • cicd_skao_int/environmentTier -> Gitlab environment tier

    • cicd_skao_int/pipelineSource -> Gitlab pipeline (trigger) source

It becomes very easy to track-down your logs using fields like kubernetes.namespace and kubernetes.labels.cicd_skao_int/jobId. As this is really specific, we were able to include prebuilt URLs on the pipeline logs to make it easier for developers to find relevant logs.

Filter logs on your machine

Being able to access Elasticsearch - and not Kibana - is what we need in order to be able to query and parse logs locally in our machines, which seems to be the preferred and most efficient way. We can craft our queries in Kibana using ES|QL, the newer query language supported in both Kibana and Elasticserch. Optionally, we can also use KQL.

curl

All we need is to do a curl command and use Elasticsearch’s native query API. For this scenario, we are giving preference to ES|QL as it is more readable and easy to structure.

As an example, lets query the ska-ser-namespace-manager REST API logs in the CICD cluster (stfc-techops-production). We don’t really know how to filter for the REST API pod logs, so lets use Discover to find that out. Starting with what we know, we can open up a document where the log looks like it is coming from the REST API and inspect:

{
  ...
  "kubernetes": {
    "container": {
      "name": "api"
    },
    ...
    "namespace": "ska-ser-namespace-manager",
    "namespace_uid": "d4639c15-52ef-4541-99f0-03b4c432bea7",
    "replicaset": {
      "name": "ska-ser-namespace-manager-api-7899b46c86"
    },
    "namespace_labels": {
      "kubernetes_io/metadata_name": "ska-ser-namespace-manager",
      "name": "ska-ser-namespace-manager"
    },
    "labels": {
      "app_kubernetes_io/managed-by": "Helm",
      "helm_sh/chart": "ska-ser-namespace-manager-0.1.4",
      "pod-template-hash": "7899b46c86",
      "app_kubernetes_io/version": "0.1.4",
      "app_kubernetes_io/part-of": "ska-ser-namespace-manager",
      "app_kubernetes_io/component": "api",
      "app_kubernetes_io/instance": "ska-ser-namespace-manager"
    }
  }
  ...
}

Clearly, we could use both kubernetes.container.name or kubernetes.labels.app_kubernetes_io/component. Converting this to ES|QL, we get:

Note

In ES|QL, fields with forward slashes (“/”) need to be used escaped as `<field>`

$ API_KEY=<your api key>
$ curl -qk -X POST "https://logging.stfc.skao.int:9200/_query?format=json&pretty" -H "Authorization: ApiKey $API_KEY" -H 'Content-Type: application/json' \
  -d'
  {
    "query": "FROM filebeat-8* | WHERE ska.datacentre == \"stfc-techops\" AND ska.environment == \"production\" AND kubernetes.namespace == \"ska-ser-namespace-manager\" AND `kubernetes.labels.app_kubernetes_io/component` == \"api\" | KEEP message | LIMIT 100"
  }
  ' 2>/dev/null | jq -r ".values[][]"

Another common use-case is seeing the logs from a namespace deployed by a Gitlab job, where we know its id and the target cluster:

$ API_KEY=<your api key>
$ curl -qk -X POST "https://logging.stfc.skao.int:9200/_query?format=json&pretty" -H "Authorization: ApiKey $API_KEY" -H 'Content-Type: application/json' \
  -d'
  {
    "query": "FROM filebeat-8* | WHERE ska.datacentre == \"stfc-techops\" AND ska.environment == \"production\" AND `kubernetes.labels.cicd_skao_int/jobId` == \"10006002600\" | KEEP message | LIMIT 1000"
  }
  ' 2>/dev/null | jq -r ".values[][]"

To view the same logs but from the oldest, we add SORT @timestamp ASC:

$ API_KEY=<your api key>
$ curl -qk -X POST "https://logging.stfc.skao.int:9200/_query?format=json&pretty" -H "Authorization: ApiKey $API_KEY" -H 'Content-Type: application/json' \
  -d'
  {
    "query": "FROM filebeat-8* | WHERE ska.datacentre == \"stfc-techops\" AND ska.environment == \"production\" AND `kubernetes.labels.cicd_skao_int/jobId` == \"10006002600\" | SORT @timestamp ASC | KEEP message | LIMIT 1000"
  }
  ' 2>/dev/null | jq -r ".values[][]"

To scope it to a specific timeframe we have multiple options, like adding WHERE @timestamp > TO_DATETIME("2025-05-13T05:00:00Z") AND @timestamp <= NOW():

$ API_KEY=<your api key>
$ curl -qk -X POST "https://logging.stfc.skao.int:9200/_query?format=json&pretty" -H "Authorization: ApiKey $API_KEY" -H 'Content-Type: application/json' \
  -d'
  {
    "query": "FROM filebeat-8* | WHERE ska.datacentre == \"stfc-techops\" AND ska.environment == \"production\" | WHERE @timestamp > TO_DATETIME(\"2025-05-13T05:00:00Z\") AND @timestamp <= NOW() | SORT @timestamp ASC | KEEP message | LIMIT 1000"
  }
  ' 2>/dev/null | jq -r ".values[][]"

Which outputs:

Get:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:4 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
May 13 05:00:00 stfc-techops-production-storage-mon-i02 bash[384977]: cluster 2025-05-13T05:00:00.000117+0000 mon.stfc-techops-production-storage-mon-i00 (mon.0) 421986 : cluster [INF] overall HEALTH_OK
May 13 05:00:00 stfc-techops-production-storage-mon-i02 bash[384977]: cluster 2025-05-13T05:00:00.135745+0000 mgr.stfc-techops-production-storage-mon-i01.wpgxxy (mgr.56847325) 51399 : cluster [DBG] pgmap v49339: 321 pgs: 321 active+clean; 794 GiB data, 2.3 TiB used, 2.4 TiB / 4.8 TiB avail; 14 KiB/s rd, 1.2 MiB/s wr, 125 op/s
2025-05-13T05:00:00.622Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
{"script": "/scripts-23023437-10014411580/get_sources"}
I0513 05:00:00.639560       1 eventhandlers.go:186] "Add event for scheduled pod" pod="ska-ser-namespace-manager/check-namespace-6f0004f3-29118540-x29fh"
I0513 05:00:00.639115       1 eventhandlers.go:186] "Add event for scheduled pod" pod="ska-ser-namespace-manager/check-namespace-6f0004f3-29118540-x29fh"
10.100.1.65 - - [13/May/2025:05:00:00 +0000] "GET / HTTP/1.1" 200 540 "-" "kube-probe/1.32"
I0513 05:00:00.652638       1 eventhandlers.go:206] "Update event for scheduled pod" pod="ska-ser-namespace-manager/check-namespace-6f0004f3-29118540-x29fh"
I0513 05:00:00.652656       1 httplog.go:132] "HTTP" verb="GET" URI="/healthz" latency="97.192µs" userAgent="kube-probe/1.32" audit-ID="" srcIP="10.100.2.197:50468" resp=200
I0513 05:00:00.653696       1 eventhandlers.go:206] "Update event for scheduled pod" pod="ska-ser-namespace-manager/check-namespace-6f0004f3-29118540-x29fh"
[backend] | 2025-05-13T05:00:00.658Z info: [main] Fetched gitlab.com/ska-telescope/ska-low-sps-smm/ska-low-sps-smm-kernel in 0.87s
[backend] |
[backend] | 2025-05-13T05:00:00.658Z info: [main] Indexing gitlab.com/ska-telescope/ska-low-sps-smm/ska-low-sps-smm-kernel.

If we want just the runner logs, we add WHERE kubernetes.container.name LIKE "test-runner*":

$ API_KEY=<your api key>
$ curl -qk -X POST "https://logging.stfc.skao.int:9200/_query?format=json&pretty" -H "Authorization: ApiKey $API_KEY" -H 'Content-Type: application/json' \
  -d'
  {
    "query": "FROM filebeat-8* | WHERE ska.datacentre == \"stfc-techops\" AND ska.environment == \"production\" AND `kubernetes.labels.cicd_skao_int/jobId` == \"10006002600\" | WHERE kubernetes.container.name LIKE \"test-runner*\" | KEEP message"
  }
  ' 2>/dev/null | jq -r ".values[][]"

Note that the most efficient way to build the relevant ES|QL queries is directly in Kibana, where you can simultaneously build the query and inspect the returned documents. After you’ve made your queries, you can then convert them to a curl command, or even create a Python script to do this and automate your debugging workflow.

Kibana ESQL toggle

Kibana ESQL toggle

elktail

elktail is a command line based interface for querying Elasticsearch, that provides a basic search and templated output analog of Kibana.

Installation

Binaries are provided for Linux, MacOS, and Windows - see instructions .

Configuration

elktail can be fully driven from the command line, but it is easier to use a configuration file to simplify use. Because Elasticsearch uses TLS encryption, it is necessary to obtain an up to date sample configuration file from the Systems Team. Or run the following to generate a template from the cental logging facility that supports the pipeline machinery and the ITF environments (Contact Platform Support for access to each of the telescope specific production Elasticsearch instances): https://gitlab.com/ska-telescope/ska-snippets/-/snippets/4842521. Don’t forget to replace the APIKey value in the resulting sample-config.yaml with your own.

How To Query

elktail uses the same KQL query syntax used by Kibana. Full details on the options and use are available here https://gitlab.com/piersharding/elktail#queries . Queries are essentially filters that enable targeted log extraction based on metadata in the JSON documents in Elasticsearch. The following example finds the last (-n 1) log entry that contains application=syslog, datacentre=mid-itf and the message body contains dnsmasq somewhere:

$ elktail -n 1 ska.application: syslog AND ska.datacentre: mid-itf AND message: dnsmasq
[2025-05-13T05:38:28.404Z] [l:] za-itf-gateway :: May 13 07:38:28 za-itf-gateway dnsmasq[3370]: config error is REFUSED (EDE: not ready)

Note

To help with determining what metadata is available to use in a query, elktail provides the -r raw dump, and -p pretty print switches to inspect the entire JSON document associated with a log message inserted in Elasticsearch:

$ elktail -p -n 1 ska.application: syslog AND ska.datacentre: mid-itf AND message: dnsmasq
[
   {
      "@timestamp": "2025-05-13T05:47:54.201Z",
      "_Id": "-o8vyJYBJIvwjgX0cO4o",
      "agent": {
            "ephemeral_id": "2b3e2cb4-18a8-4f09-89ab-0f8487ddb992",
            "id": "76fbd991-4dcb-4f24-9176-bf20b9ab8feb",
            "name": "za-itf-gateway",
            "type": "filebeat",
            "version": "8.4.3"
      },
      "ecs": {
            "version": "8.0.0"
      },
      "host": {
            "name": "za-itf-gateway"
      },
      "input": {
            "type": "log"
      },
      "log": {
            "file": {
               "path": "/var/log/syslog"
            },
            "offset": 138122570
      },
      "message": "May 13 07:47:53 za-itf-gateway dnsmasq[3370]: query[A] hpiers.obspm.fr.svc.miditf.internal.skao.int from 10.20.0.21",
      "ska": {
            "application": "syslog",
            "datacentre": "mid-itf",
            "environment": "production",
            "service": "k8s"
      }
   }
]

Looking at this sample JSON document dump, we can see that there are values that can for instance be used to select the host name associated with log messages eg: host.name: za-itf-gateway .

Looking for Container Logs

There are key metadata elements associated with logs that associate them with a cluster, namespace and container. The core values are:

Field

Use

Example

input.type

Log source - syslog, containers

log, container

ska.datacentre

Identifies the datacentre logs come from

stfc-techops, stfc-dp, mid-tf, low-itf

kubernetes.namespace

Namespace

staging-dish-lmc-ska100

kubernetes.statefulset.name

StatefulSet name

ds-dish-logger-100

kubernetes.container.name

Container name

deviceserver

ska_severity

SKA custom log field for log severity

INFO, ERROR

ska_tags_field

SKA custom dynamic log message tags

ska_tags_field.tango-device: ska100/spfrxpu/controller

Note

there are more SKA custom fields available - for more details on these go here https://confluence.skatelescope.org/display/SWSI/SKA+Log+Message+Format . Also note that more details are available for the builtin Kubernetes metadata here https://www.elastic.co/docs/reference/beats/filebeat/exported-fields-kubernetes-processor .

Putting this all together, we can create a query like this:

$ elktail -n 1  "ska.datacentre: mid-itf AND kubernetes.namespace: staging-dish-lmc-ska100 AND kubernetes.statefulset.name: ds-dish-logger-100 AND kubernetes.container.name: deviceserver AND ska_tags_field.tango-device: ska100/spfrxpu/controller"
[2025-05-13T06:59:28.867Z] [l:INFO] za-itf-cloud03 :: 1|2025-05-13T06:59:28.866Z|INFO|unknown_thread||unknown_file#0|tango-device:ska100/spfrxpu/controller|[/usr/local/src/ska-mid-spfrx-controller-ds/src/SkaMidSpfrxControllerDs.cpp:8205] SkaMidSpfrxControllerDs::monitor_ping(): Ping received from client

Adjust Template Output

Once records are filtered for output, the output format can be adjusted by providing a template referencing fields:

$ elktail -n 1 -F "%ska_log_timestamp :: SEV: [%ska_severity] TAGS: %ska_tags :: MSG -> %ska_message"  "ska.datacentre: mid-itf AND kubernetes.namespace: staging-dish-lmc-ska100 AND kubernetes.statefulset.name: ds-dish-logger-100 AND kubernetes.container.name: deviceserver AND ska_tags_field.tango-device: ska100/spfrxpu/controller"
2025-05-13T07:23:09.607Z :: SEV: [INFO] TAGS: tango-device:ska100/spfrxpu/controller :: MSG -> [/usr/local/src/ska-mid-spfrx-controller-ds/src/SkaMidSpfrxControllerDs.cpp:8205] SkaMidSpfrxControllerDs::monitor_ping(): Ping received from client

More details available here https://gitlab.com/piersharding/elktail#format-output .

Monitoring & Logging Dashboards

To facilitate the correlation of application state with resource or infrastructural resources, we also provide the same logs in several Grafana dasboards. Please refer to how to use the monitoring solution for a deep dive on the capabilities of these dashboards and how they should be used.

Grafana pod logs

Grafana pod logs

Note that the timeframe of the logs is automatically adjusted for the timeframe selected in the Grafana dashboard. For these logs, we get the same fields (that we can filter with, in this Grafana dashboard itself) we do in Kibana:

Grafana document fields

Grafana document fields

This provides a unique view into the application’s status and how it affects its own resource usage as well as the underlying infrastructure.

Logging Service URLs

Datacentre

Kibana

Elasticsearch

stfc-techops (cicd)

https://k8s.stfc.skao.int/kibana

https://logging.stfc.skao.int:9200

stfc-dp (cicd)

https://k8s.stfc.skao.int/kibana

https://logging.stfc.skao.int:9200

aws-*

https://k8s.stfc.skao.int/kibana

https://logging.stfc.skao.int:9200

mid-itf/low-itf

https://k8s.stfc.skao.int/kibana

https://logging.stfc.skao.int:9200

mid-aa

https://k8s.mid.internal.skao.int/kibana

https://logging.mid.internal.skao.int:9200

low-aa

https://k8s.low.internal.skao.int/kibana

https://logging.low.internal.skao.int:9200

For the environments in central logging, you can filter with:

Datacentre

ska.datacentre

ska.environment

stfc-techops (cicd)

stfc-techops

production

stfc-dp (cicd)

stfc-dp

production

aws-*

N/A

N/A

mid-itf

mid-itf

production

low-itf

low-itf

production

psi-mid

psi-mid

production

digital-signal-psi

digital-signal-psi

production