OCI Cache Priming - SKAO OCI Daemon
SKAO OCI Daemon is a tool for securely priming OCI images into compute nodes and OCI registries. You can learn more about how it is going to be used to help enhance and secure the sofware supply-chain and its features.
For a more extensive overview, please refer to the project’s README and configuration classes for detailed configuration information.
Feature Flags
This project has feature flags that can be configured with:
enabled: <true | false>
url: "https://gitlab.com/api/v4/feature_flags/unleash/67009360"
instance_id: <gitlab instance_id for the repository feature flags>
environment: <production|development|ci>
Note that you can disable the feature flags completely or configure any Unleash server.
Registry Configuration
The registry configuration can be used to specify an OCI registry regarding authentication, how to do image discovery, how to verify signatures and what images to look for:
registries:
- host: <registry-host>
...
skip_image_discovery: <true | false; If true, assumes image names are literal>
...
signature_verify:
mode: <strict | permissive | skip>
trustStores:
- name: <trust store name>
store_type: <ca | signingAuthority | tsa>
cert_b64: |
<certificate encoded in base64>
authentication:
type: <token | basic>
username: <username>
password: <password>
period: <polling-period-seconds>
images:
- name: <image-name-regex>
include_tags:
- "1.0.0"
- "^2.0.0"
exclude_tags:
- ">2.0.0"
- ">=2.0.0"
- "<=2.0.0"
- "<2.0.0"
- ">2.0.0&&<2.1.0"
mirrors:
- host: <mirror-host>
authentication:
type: <token | basic>
username: <username>
password: <password>
insecure: <true | false>
tls_verify: <true | false>
...
For special scenarios, where image discovery is not possible with the OCI API, we can use native image discovery (currently only with Nexus support):
registries:
- host: <host>
native_image_discovery:
enabled: true
type: nexus
host: <api host>
repository: <repository in the target registry>
Note that we can do very intricate image tag matching with the following operators:
<, >, <= and >=
^ - matching anything with the same major and minor versions
Every entry in include_tags or exclude_tags is or’ed but any expression can and’ed with && as shown in the example. Depending on where it is being used, a registry configuration targets a different purpose:
Upstream registry (registries under registries)
Cache/Target registry (registries under caches)
Mirror registry (registries under an upstream registry’s mirrors)
Therefore, the following variables have no effect if under mirror or cache registries:
mirrors
images
period
referrers_verify
signature_verify
Also, the following variables have no effect if under mirror registries:
native_image_discover
Cache Mode
As mentioned, the cache mode is meant to prime OCI registries that act as caches (or any other registry for that matter). In terms of configurations, items in caches support the same settings and items in registries.
caches:
- host: some-cache:8080
authentication:
type: <token | basic>
username: <username>
password: <password>
insecure: <true | false>
tls_verify: <true | false>
...
- host: another-cache:8080
...
registries:
- host: artefact.skao.int
...
images:
- name: ska-python
include_cache/exclude_cache:
- some-cache:8080
From a single instance of the OCI Daemon we can prime many caches, provided we can reach them. Note that caches can be included or excluded from particular image configurations for any registry.
Node Mode
The node mode is meant to prime OCI engines’ internal registries in order for the images to be available whenever a container that is deployed needs a particular image. Even when using a cache, the pull-and-load operations take time, so this dramatically improves the deployment’s speed.
registries:
- host: artefact.skao.int
load_in_engines:
containerd: true
docker: true
podman: false
images:
- name: ska-python
- name: ska-ser-oci-daemon
load_in_engines:
containerd: false
docker: true
podman: false
Currently we can prime containerd, docker and podman internal registries, provided the respective sockets are available to the OCI Daemon. Note that for each image configuration we can override which engines the image should be loaded to.
Using a container
To deploy using a container, simply do:
$ mkdir -p /opt/oci-daemon
$ mkdir -p /opt/oci-daemon/data # Optionally mount a volume here
$ vi /opt/oci-daemon/config.yaml # Create oci daemon configuration
$ vi /opt/oci-daemon/unleash_config.yaml # Create unleash configuration
$ docker run --name oci-daemon \
-v /opt/oci-daemon/config.yaml:/opt/oci-daemon/config.yaml:ro \
-v /opt/oci-daemon/unleash_config.yaml:/opt/oci-daemon/unleash_config.yaml:ro \
-v /opt/oci-daemon/data:/opt/oci-daemon/data \
-v /run/containerd/containerd.sock:/run/containerd/containerd.sock:ro \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
artefact.skao.int/ska-ser-oci-daemon:<grab latest tag from harbor>
--mode <cache/node>
Note that we mount the config files, the data directory and one or more OCI engine sockets. These sockets are only needed when running in node mode, and you only need the sockets for the engines you will be using.
Using the Helm Chart
In the Helm Chart we can provide a simple values.yml file to define its configuration. Each mode - cache and node - has separate configurations:
image:
pullPolicy: IfNotPresent
repository: artefact.skao.int/ska-ser-oci-daemon
tag: null
cache:
enabled: false
node:
enabled: true
config: <oci daemon node configuration>
storage:
data:
size: 50Gi
tolerations:
- effect: NoSchedule
key: nvidia.com/gpu
operator: Equal
value: "true"
By default, the cache mode in the cluster is disabled, as it is deployed as a container next to the registry it is priming. Nonetheless, we could configure it:
cache:
enabled: true
config: <oci daemon cache configuration>
storage:
data:
size: 50Gi
Also, this chart integrates with VSO to automatically update the configuration and rotate the OCI Daemon instances. To use it you need to set:
cache/node:
externalConfig:
enabled: true
mount: <engine in Vault>
path: <path to secret in Vault>
refreshAfter: 30s
vaultAuthRef: <Vault auth ref in the cluster, leave null for default>