Understanding CI/CD at SKAO#
Learn why SKAO uses CI/CD and how the pipeline architecture works.
Why CI/CD?#
Continuous Integration and Continuous Deployment (CI/CD) ensures that SKAO software is:
Consistently built — Every project follows the same pipeline stages
Automatically tested — Tests run on every commit
Reliably deployed — Pipelines publish artefacts to the Central Artefact Repository
Traceable — Every build is linked to a specific git commit
CI/CD reduces manual errors, speeds up delivery, and maintains code quality across the distributed SKAO development teams.
How SKAO CI/CD Works#
SKAO uses GitLab CI/CD with custom runners hosted on SKAO infrastructure.
The pipeline flow:
Developer pushes code
↓
GitLab detects .gitlab-ci.yml
↓
Pipeline triggered
↓
Jobs run on SKA runners
↓
Build → Lint → Test → Scan → Publish → Pages
↓
Central Artefact Repository stores artefacts
↓
System updates metrics and badges
The Templates Repository#
To standardise pipelines across all projects, SKAO maintains a templates-repository.
Benefits:
Consistency — All projects use the same job definitions
Maintainability — Updates to templates propagate to all projects
Simplicity — Developers include templates instead of writing jobs from scratch
Flexibility — Make targets allow local customisation
How it works:
Each template calls standardised make targets. This means the same commands
work locally and in the pipeline:
# In the template
script:
- make python-test
# Developers can customise via Makefile variables
PYTHON_VARS_AFTER_PYTEST = -m 'not post_deployment' --forked
Note
You can provide variables directly in .gitlab-ci.yml, but this
is not recommended because it makes local development and pipeline behaviour diverge.
Kubernetes-Based Runners#
SKAO CI/CD runners operate on Kubernetes clusters, providing:
Auto-scaling — Runners scale based on demand
Isolation — Each job runs in its own container
Shared cache — Speeds up job times across runners
Docker support — Docker-in-Docker available for container builds
Architecture:
Runners run on nodes labelled for CI/CD jobs
A dedicated Docker daemon runs on nodes (not Docker-in-Docker) for security
BuildX worker pools support multi-platform container builds
Note
Kubernetes runners do not support docker-compose.
GPU Pipelines#
For machine learning and compute-intensive workloads, SKAO provides GPU runners.
Available clusters:
techops — Main CI/CD cluster with limited GPU support (primarily for building GPU-enabled artefacts)
dp — Data Processing cluster with more GPUs for running actual workloads
Using GPUs:
Tag your job with a GPU runner tag (e.g.,
ska-gpu-a100)Set resource limits in your container configuration to claim GPU instances
GPU runners follow the same Kubernetes architecture as standard runners.
Multi-Platform Builds with BuildX#
SKAO supports building container images for multiple architectures (AMD64, ARM) using Docker BuildX.
How it works:
Dedicated BuildX worker pools run on
ska-buildxtagged runnersQEMU emulation enables cross-platform builds
The OCI build template provides
oci-image-build-armv5andoci-image-build-armv8jobs
When to use:
Deploying to ARM-based edge devices
Supporting multiple processor architectures
Building universal container images
CI Health Metrics and Badges#
SKAO tracks code health across all projects through automated metrics collection.
Required metrics:
Unit tests — Number of tests, errors, failures
Linting — Static analysis results
Coverage — Percentage of code covered by tests
Badges:
Badges display metrics on each repository, showing the default branch status. This provides quick visibility into project health.
Why metrics matter:
Track quality trends over time
Identify projects needing attention
Ensure compliance with SKAO standards
Support release decisions
The Central Artefact Repository#
The Central Artefact Repository (CAR) stores all published artefacts and provides:
Native tool access — Use pip, docker, helm natively
Versioning — Semantic versioning for all artefacts
Metadata — Extensible metadata for lifecycle management
Security — Vulnerability scanning, access control, provenance
Integration — APIs for DevSecOps processes
See Central Artefact Repository for full details.
Rate Limiting#
GitLab implements rate limiting to protect against denial-of-service attacks and ensure fair resource usage.
What happens:
Excessive requests receive HTTP 429 responses
Scripts must wait before retrying
Best practices:
Implement retry logic with delays
Cache results where possible
Avoid tight loops making API calls