How containers work#

Background concepts for understanding containerisation and orchestration at SKAO.


What is containerisation?#

Containerisation provides lightweight virtualisation that runs directly in the host Linux kernel. Containers share the host kernel while maintaining isolation between applications, unlike virtual machines.

A Container Runtime (such as containerd - used by most Kubernetes clusters - Docker or Podman) launches containers using key Linux kernel features:

Namespaces create isolation by switching the container’s init process (PID 1) into separate kernel namespaces for processes, network stacks, and mount tables. This isolates the container from all other running processes.

Cgroups (control groups) manage resource allocation — memory, CPU, network, and I/O quotas, limits, and priorities.

Capabilities split tradition root user’s privileges into fine-grained units, so a container can be given just the permissions it needs.

Seccomp (secure computing mode, syscall filtering) allows container runetimes to filter and block dangerous syscalls.

File-system magic (pivot_root and bind mounting) recasts the root filesystem for the container to the container image directory tree, and enables sharing host resources into containers.

The container image#

The container image encapsulates all dependencies: executables, libraries, configuration, and sometimes data. The OCI Image Specification defines how images are constructed.

Images consist of stacked layers, starting with a minimal OS as the base layer (ranging from scratch and distroless to alpine or ubuntu), with successive layers adding libraries, configuration, and application code. At container launch, these layers stack using a Union File System to create a complete read-only filesystem view. A final read/write layer sits on top for runtime changes.

This layered approach means:

  • Shared base images reduce storage and download time

  • The Container Runtime only downloads layers it does not already have

  • Running containers share read-only layers, saving memory

Why containers are immutable#

Containers follow the principle of immutability — you can destroy and recreate them with no side effects. This enables:

  • Reproducibility: The same image produces the same behaviour everywhere

  • Upgradeability: Replace containers rather than patching them

  • Scalability: Spin up identical replicas instantly

  • Recoverability: Restart failed containers without manual intervention

Store any persistent state outside the container using mounted volumes. Never store application state inside a container.

Structuring containerised applications#

Each container runs one discrete application. Ask these questions:

  • Does it have a single executable entry point?

  • Does the running process fulfil a single purpose?

  • Can you maintain and upgrade the process independently?

  • Can you scale the process independently?

For example, running iperf or apache2 as separate containers is correct. Putting NGINX and PostgreSQL in the same container is wrong — they require independent maintenance, upgrades, and scaling.

Avoid multi-process init systems such as supervisord. If your design needs it, reconsider — each application controlled by the init process belongs in its own container.

Why SKAO uses Kubernetes#

Kubernetes is the container orchestration platform at the core of SKAO’s Cloud Native architecture. It provides:

Resource abstraction: A cluster provides an opaque pool of compute, network, and storage resources addressed through a consistent REST API, not individual machines. Pluggable drivers bring advanced capabilities, being modifiable to adapt to the particular cluster’s constraints.

Declarative configuration: Specify the desired state (“run 3 replicas of this container with this storage”), and the scheduler moves the cluster toward that state.

Auto-healing: When applications or nodes fail, Kubernetes automatically restarts or reschedules workloads.

Horizontal scaling: Scale applications by replicating Pods across the cluster.

Portability: The same configuration works across development, CI/CD, and production clusters, spanning multiple node’s OS distributions and configurations.

Multi-cluster connectivity: Cross-cluster networking lets workloads in geographically distributed clusters reach each other as if they were on one flat network.

Cloud Native benefits for SKAO#

Cloud Native practices support SKAO’s distributed development model:

  • Codified standards through CI/CD testing gates ensure consistency

  • Automation enables rebuilding for security patches at minimal cost

  • Portability of both code and infrastructure definitions

  • Common standards for integration with SKA Regional Centres and other projects

  • Social coding platform enabling broad engagement and collaboration

Nodes and the control plane#

A Kubernetes cluster has two parts: a control plane that decides what runs where, and a set of nodes that run the workloads.

The control plane runs the cluster’s brain:

  • kube-apiserver is the only front door. Every component — users, controllers, and nodes — talks to the cluster through it.

  • etcd is the cluster’s state store. All desired and observed state lives here.

  • kube-scheduler watches for unscheduled Pods and picks a node for each, based on resource requests, affinity rules, and constraints.

  • kube-controller-manager runs the reconciliation loops that keep the cluster matching its desired state — Deployments, ReplicaSets, Jobs, and others.

Each node runs three components:

  • kubelet is the node agent. It watches the API server for Pods assigned to its node and drives the container runtime to start and stop them.

  • Container runtime — usually containerd or CRI-O — pulls images and runs containers, implementing the Container Runtime Interface (CRI).

  • kube-proxy programmes the node’s network rules (iptables, IPVS, or eBPF) so that Service virtual IPs route to the right Pods.

Users never talk to nodes directly. Every action — deploying an application, reading a log, scaling a workload — goes through the API server, which records the intent in etcd. Controllers and the scheduler react, and kubelets on the nodes make it happen.

Workload types#

Kubernetes schedules workloads using different controllers:

Pod: The minimum deployable unit — one or more containers sharing the same namespace. Containers in a Pod communicate via localhost and share storage.

Deployment: A replicated Pod set with rolling update support. Use for stateless applications.

StatefulSet: Like Deployment but with guarantees about naming, ordering, and persistent storage. Use for databases and message queues.

DaemonSet: Ensures exactly one Pod instance runs on each designated node. Use for monitoring agents or log collectors.

Job/CronJob: Run-to-completion workloads. Use for batch processing or scheduled tasks.

Container patterns in Pods#

Multi-container Pods follow established patterns:

Sidecar: Extends the primary container, now a first-class Kubernetes feature since v1.29, as they are now guaranteed to start before and end after regular containers.

Ambassador: Outbound proxy for the primary container — handles connections to external services.

Adapter: Inbound proxy — transforms inputs to match the primary container’s expected format.

All containers in a Pod share the same network namespace (communicate via localhost) and can share storage volumes.

Pod networking#

Kubernetes assigns every Pod its own IP address. Any Pod can reach any other Pod directly, with no NAT between them — the cluster is a flat network from the application’s point of view. Containers inside the same Pod share a network namespace and communicate over localhost.

Pod IPs are ephemeral: when a Pod restarts or reschedules, its IP changes. To give applications a stable endpoint, Kubernetes uses Services. A Service provides a fixed virtual IP and DNS name that load-balances across the Pods matching its label selector:

  • ClusterIP exposes the Service inside the cluster (the default).

  • NodePort exposes it on every node’s IP at a fixed port.

  • LoadBalancer provisions an external load balancer (cloud or MetalLB).

  • Ingress routes external HTTP and HTTPS traffic to Services by host or path.

The networking layer itself is pluggable. A CNI (Container Network Interface) plugin — such as Calico or — implements Pod-to-Pod connectivity, IP address management, and, optionally, network policy enforcement.

Modern CNIs such as Cilium go well beyond basic Pod connectivity. Built on eBPF, they programme the Linux kernel directly rather than relying on iptables, which improves throughput and scales to large clusters. On top of that, they provide identity-aware network policy that can filter traffic at L3/L4 and at L7 (HTTP paths, gRPC methods, Kafka topics), transparent encryption between nodes (WireGuard or IPsec), a kube-proxy replacement that implements Services in eBPF, and multi-cluster networking for connecting workloads across clusters. Cilium also offers a built-in service mesh — mTLS, traffic routing, and observability without requiring a sidecar proxy in every Pod.

Service discovery with DNS#

Kubernetes uses DNS for service discovery using CoreDNS. Applications resolve dependent services by name:

tangodb              # Same namespace
tangodb.test         # Explicit namespace
tangodb.test.svc.cluster.local  # Fully qualified

Design containerised applications to:

  • Resolve dependent services via DNS

  • Expose their own services over DNS

  • Avoid hardcoded IP addresses or hostnames

This ensures seamless integration when applications move between environments.

Persistent storage#

Containers are ephemeral: any state written to a container’s own filesystem disappears when the container is replaced. Persistent data must live outside the container, on storage that survives restarts and reschedules.

In a cluster, this is harder than on a single machine. A Pod can be scheduled onto any node, and may move between nodes over its lifetime, so the storage must follow the workload rather than being tied to a specific host. Kubernetes orchestrates the whole lifecycle: attaching volumes to the right node, mounting them into the containers that need them, detaching them cleanly when a Pod terminates, and re-attaching elsewhere when the workload reschedules.

Kubernetes also abstracts the storage backend so applications do not need to know what sits underneath. Through the `Container Storage Interface (CSI)<https://github.com/container-storage-interface/spec>`_, any storage system — cloud block storage, network filesystems, distributed block storage, or object-backed filesystems — plugs in as a driver and exposes its capabilities uniformly. Applications request storage by size, performance class, and access mode (single-writer, multi-reader, or shared read-write), and the cluster selects, provisions, and attaches a matching volume.

Logging in containers#

SKAO uses the SKA Log Message Format standard for all software.

Containerised applications must:

  • Log to stdout/stderr (captured by the Container Engine)

  • Support switching to syslog via configuration

  • Use JSON-structured log messages for semantic parsing

The infrastructure logging solution (ElasticStack) decorates logs with additional context. By logging to stdout/stderr, applications automatically integrate with cluster-wide log aggregation.

Training resources#


See also#