Understanding CI/CD at SKAO#

Learn why SKAO uses CI/CD and how the pipeline architecture works.


Why CI/CD?#

Continuous Integration and Continuous Deployment (CI/CD) ensures that SKAO software is:

  • Consistently built — Every project follows the same pipeline stages

  • Automatically tested — Tests run on every commit

  • Reliably deployed — Pipelines publish artefacts to the Central Artefact Repository

  • Traceable — Every build is linked to a specific git commit

CI/CD reduces manual errors, speeds up delivery, and maintains code quality across the distributed SKAO development teams.


How SKAO CI/CD Works#

SKAO uses GitLab CI/CD with custom runners hosted on SKAO infrastructure.

The pipeline flow:

Developer pushes code
      ↓
GitLab detects .gitlab-ci.yml
      ↓
Pipeline triggered
      ↓
Jobs run on SKA runners
      ↓
Build → Lint → Test → Scan → Publish → Pages
      ↓
Central Artefact Repository stores artefacts
      ↓
System updates metrics and badges

The Templates Repository#

To standardise pipelines across all projects, SKAO maintains a templates-repository.

Benefits:

  • Consistency — All projects use the same job definitions

  • Maintainability — Updates to templates propagate to all projects

  • Simplicity — Developers include templates instead of writing jobs from scratch

  • Flexibility — Make targets allow local customisation

How it works:

Each template calls standardised make targets. This means the same commands work locally and in the pipeline:

# In the template
script:
  - make python-test

# Developers can customise via Makefile variables
PYTHON_VARS_AFTER_PYTEST = -m 'not post_deployment' --forked

Note

You can provide variables directly in .gitlab-ci.yml, but this is not recommended because it makes local development and pipeline behaviour diverge.


Kubernetes-Based Runners#

SKAO CI/CD runners operate on Kubernetes clusters, providing:

  • Auto-scaling — Runners scale based on demand

  • Isolation — Each job runs in its own container

  • Shared cache — Speeds up job times across runners

  • Docker support — Docker-in-Docker available for container builds

Architecture:

  • Runners run on nodes labelled for CI/CD jobs

  • A dedicated Docker daemon runs on nodes (not Docker-in-Docker) for security

  • BuildX worker pools support multi-platform container builds

Note

Kubernetes runners do not support docker-compose.


GPU Pipelines#

For machine learning and compute-intensive workloads, SKAO provides GPU runners.

Available clusters:

  • techops — Main CI/CD cluster with limited GPU support (primarily for building GPU-enabled artefacts)

  • dp — Data Processing cluster with more GPUs for running actual workloads

Using GPUs:

  1. Tag your job with a GPU runner tag (e.g., ska-gpu-a100)

  2. Set resource limits in your container configuration to claim GPU instances

GPU runners follow the same Kubernetes architecture as standard runners.


Multi-Platform Builds with BuildX#

SKAO supports building container images for multiple architectures (AMD64, ARM) using Docker BuildX.

How it works:

  • Dedicated BuildX worker pools run on ska-buildx tagged runners

  • QEMU emulation enables cross-platform builds

  • The OCI build template provides oci-image-build-armv5 and oci-image-build-armv8 jobs

When to use:

  • Deploying to ARM-based edge devices

  • Supporting multiple processor architectures

  • Building universal container images


CI Health Metrics and Badges#

SKAO tracks code health across all projects through automated metrics collection.

Required metrics:

  • Unit tests — Number of tests, errors, failures

  • Linting — Static analysis results

  • Coverage — Percentage of code covered by tests

Badges:

Badges display metrics on each repository, showing the default branch status. This provides quick visibility into project health.

Why metrics matter:

  • Track quality trends over time

  • Identify projects needing attention

  • Ensure compliance with SKAO standards

  • Support release decisions


The Central Artefact Repository#

The Central Artefact Repository (CAR) stores all published artefacts and provides:

  • Native tool access — Use pip, docker, helm natively

  • Versioning — Semantic versioning for all artefacts

  • Metadata — Extensible metadata for lifecycle management

  • Security — Vulnerability scanning, access control, provenance

  • Integration — APIs for DevSecOps processes

See Central Artefact Repository for full details.


Rate Limiting#

GitLab implements rate limiting to protect against denial-of-service attacks and ensure fair resource usage.

What happens:

  • Excessive requests receive HTTP 429 responses

  • Scripts must wait before retrying

Best practices:

  • Implement retry logic with delays

  • Cache results where possible

  • Avoid tight loops making API calls


See Also#