Server Compatibility Test

The ska-pst chart includes a self-test functionality for validating server configuration compatibility. This is enabled through the chart values field selftest.enabled to true.

Test Description
NVME detection Confirms the existence of expected NVME mount.
NVME write test Tests the write speed of detected NVME mount.
NVIDIA GPU and driver detection Confirms the existence of NVIDIA GPU device(s) and installed driver.
Mellanox NIC detection Confirms the existence of Mellanox NIC devices.
Shared memory mount Confirms the existence of shared memory through the path /dev/shm
shmmax configured Confirms shmmax configuration present in /proc/sys/kernel/shmmax
shmmin configured Confirms shmmin configuration present in /proc/sys/kernel/shmmin
Memlock Confirms if the allowed lockable memory is set to at least the minimum advised requirement
CUDA H2D and D2H Confirms if CUDA host to device data transfer and its reverse meets the advised minimum requirement

Executing the self-test through Kubernetes in the ska-pst chart

Executing the self-test as a k8s Job ensures that the advised server configurations are inherited by the ska-pst-core containers through Kubernetes plugins and configurations.

Post installation of the ska-pst chart, the ska-pst-selftest pod can be interrogated to determine the selftest results.

alt text

Executing the self-test through docker directly on a baremetal server

The ska-pst self-test script can also be executed through a container engine such as docker on a baremetal server. A motivation to execute the self-test script in this layer is to confirm if the containerisation engine (an abstraction layer beneath Kubernetes) configuration is configured as expected.

alt text