OCI Cache Priming - SKAO OCI Daemon

SKAO OCI Daemon is a tool for securely priming OCI images into compute nodes and OCI registries. You can learn more about how it is going to be used to help enhance and secure the sofware supply-chain and its features.

For a more extensive overview, please refer to the project’s README and configuration classes for detailed configuration information.

Feature Flags

This project has feature flags that can be configured with:

enabled: <true | false>
url: "https://gitlab.com/api/v4/feature_flags/unleash/67009360"
instance_id: <gitlab instance_id for the repository feature flags>
environment: <production|development|ci>

Note that you can disable the feature flags completely or configure any Unleash server.

Registry Configuration

The registry configuration can be used to specify an OCI registry regarding authentication, how to do image discovery, how to verify signatures and what images to look for:

registries:
  - host: <registry-host>
    ...
    skip_image_discovery: <true | false; If true, assumes image names are literal>
    ...
    signature_verify:
      mode: <strict | permissive | skip>
      trustStores:
        - name: <trust store name>
          store_type: <ca | signingAuthority | tsa>
          cert_b64: |
            <certificate encoded in base64>
    authentication:
      type: <token | basic>
      username: <username>
      password: <password>
    period: <polling-period-seconds>
    images:
      - name: <image-name-regex>
        include_tags:
          - "1.0.0"
          - "^2.0.0"
        exclude_tags:
          - ">2.0.0"
          - ">=2.0.0"
          - "<=2.0.0"
          - "<2.0.0"
          - ">2.0.0&&<2.1.0"
    mirrors:
      - host: <mirror-host>
        authentication:
          type: <token | basic>
          username: <username>
          password: <password>
        insecure: <true | false>
        tls_verify: <true | false>
        ...

For special scenarios, where image discovery is not possible with the OCI API, we can use native image discovery (currently only with Nexus support):

registries:
- host: <host>
    native_image_discovery:
    enabled: true
    type: nexus
    host: <api host>
    repository: <repository in the target registry>

Note that we can do very intricate image tag matching with the following operators:

<, >, <= and >=
^ - matching anything with the same major and minor versions

Every entry in include_tags or exclude_tags is or’ed but any expression can and’ed with && as shown in the example. Depending on where it is being used, a registry configuration targets a different purpose:

Upstream registry (registries under registries)
Cache/Target registry (registries under caches)
Mirror registry (registries under an upstream registry’s mirrors)

Therefore, the following variables have no effect if under mirror or cache registries:

mirrors
images
period
referrers_verify
signature_verify

Also, the following variables have no effect if under mirror registries:

native_image_discover

Cache Mode

As mentioned, the cache mode is meant to prime OCI registries that act as caches (or any other registry for that matter). In terms of configurations, items in caches support the same settings and items in registries.

caches:
- host: some-cache:8080
  authentication:
    type: <token | basic>
    username: <username>
    password: <password>
  insecure: <true | false>
  tls_verify: <true | false>
  ...
- host: another-cache:8080
  ...
registries:
- host: artefact.skao.int
  ...
  images:
    - name: ska-python
      include_cache/exclude_cache:
        - some-cache:8080

From a single instance of the OCI Daemon we can prime many caches, provided we can reach them. Note that caches can be included or excluded from particular image configurations for any registry.

Node Mode

The node mode is meant to prime OCI engines’ internal registries in order for the images to be available whenever a container that is deployed needs a particular image. Even when using a cache, the pull-and-load operations take time, so this dramatically improves the deployment’s speed.

registries:
- host: artefact.skao.int
  load_in_engines:
    containerd: true
    docker: true
    podman: false
  images:
    - name: ska-python
    - name: ska-ser-oci-daemon
      load_in_engines:
        containerd: false
        docker: true
        podman: false

Currently we can prime containerd, docker and podman internal registries, provided the respective sockets are available to the OCI Daemon. Note that for each image configuration we can override which engines the image should be loaded to.

Using a container

To deploy using a container, simply do:

$ mkdir -p /opt/oci-daemon
$ mkdir -p /opt/oci-daemon/data # Optionally mount a volume here
$ vi /opt/oci-daemon/config.yaml # Create oci daemon configuration
$ vi /opt/oci-daemon/unleash_config.yaml # Create unleash configuration
$ docker run --name oci-daemon \
  -v /opt/oci-daemon/config.yaml:/opt/oci-daemon/config.yaml:ro \
  -v /opt/oci-daemon/unleash_config.yaml:/opt/oci-daemon/unleash_config.yaml:ro \
  -v /opt/oci-daemon/data:/opt/oci-daemon/data \
  -v /run/containerd/containerd.sock:/run/containerd/containerd.sock:ro \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  artefact.skao.int/ska-ser-oci-daemon:<grab latest tag from harbor>
  --mode <cache/node>

Note that we mount the config files, the data directory and one or more OCI engine sockets. These sockets are only needed when running in node mode, and you only need the sockets for the engines you will be using.

Using the Helm Chart

In the Helm Chart we can provide a simple values.yml file to define its configuration. Each mode - cache and node - has separate configurations:

image:
  pullPolicy: IfNotPresent
  repository: artefact.skao.int/ska-ser-oci-daemon
  tag: null

cache:
  enabled: false

node:
  enabled: true
  config: <oci daemon node configuration>
  storage:
    data:
      size: 50Gi
  tolerations:
  - effect: NoSchedule
    key: nvidia.com/gpu
    operator: Equal
    value: "true"

By default, the cache mode in the cluster is disabled, as it is deployed as a container next to the registry it is priming. Nonetheless, we could configure it:

cache:
  enabled: true
  config: <oci daemon cache configuration>
  storage:
    data:
      size: 50Gi

Also, this chart integrates with VSO to automatically update the configuration and rotate the OCI Daemon instances. To use it you need to set:

cache/node:
  externalConfig:
    enabled: true
    mount: <engine in Vault>
    path: <path to secret in Vault>
    refreshAfter: 30s
    vaultAuthRef: <Vault auth ref in the cluster, leave null for default>