Functionality and Usage
Environment Configuration
To connect to a SLURM cluster, the following environment variables must be defined.
Environment Variable |
Description |
|---|---|
|
API URL of the Slurm cluster to connect to |
|
JWKS Username to access the Slurm cluster |
|
Address of the JWKS which issues JWTs |
|
Secret used to request a JWT from JWKS which is then used to access the Slurm Cluster |
|
User generated JWT |
Key Components
The following modules make up the core of the SDP Slurm Deployer:
Module |
Description |
|---|---|
|
Entrypoint. Loads configuration, initializes components, and starts the async event loop to handle deployment events. |
|
Watches the SDP Configuration Database for new or updated |
|
Processes deployment events, constructs job with |
|
Interfaces with the SLURM REST API for job submission, status queries, and cancellation. Filters jobs by |
|
Interfaces with Azure Entra to collect a JWT for authentication credentials for the Slurm REST API. |
Authentication
In order to submit a slurm job to the Slurm REST API, a recognised Username and JWT (JSON Web Token) must be passed with the request.
This can be achieved in two ways: directly with a JWT or indirectly via JSON Web Key Set (JWKS). The slurm deployer will try the JWT method and if the correct environment variables are not supplied, default to the JWKS method.
The JWT approach requires providing a username and a JWT associated with that username (client ID) via the following environment variables:
SDP_SLURMDEPLOY_CLIENT_IDSDP_SLURMDEPLOY_SLURM_JWT
For the JWKS approach, the username is the ‘client id’ of the slurm deployer app. The JWT is issued from a JWKS in Azure Entra. The credentials used to access and collect this JWT are the client id and client secret, which are set via the following environment variables:
SDP_SLURMDEPLOY_AZURE_URLSDP_SLURMDEPLOY_CLIENT_IDSDP_SLURMDEPLOY_CLIENT_SECRET
Job Configuration
Slurm jobs are configured via environment variables and the SDP deployment submitted by the processing script. Deployments can overwrite defaults set in the slurm deployer.
By default, the following environment variables are passed to the slurm jobs:
Job environment variable |
Default |
|---|---|
|
|
|
|
|
|
|
|
In addition, any project environment variables starting with “SDP_SLURM_ENV” are also loaded to the job environment with variable names stripped from “SDP_SLURM_ENV”, e.g.:
KUBECONFIG = os.getenv("SDP_SLURM_ENV_KUBECONFIG")
Job Labelling
To ensure jobs can be tracked per deployment instance, each job submitted by the deployer
is tagged with a unique mcs_label. This label is automatically generated using the following
environment variables:
SDP_SLURMDEPLOY_RELEASE_NAMESDP_SLURMDEPLOY_NAMESPACE
Label format:
sdp_slurm_deployer_{SDP_SLURMDEPLOY_RELEASE_NAME}_{SDP_SLURMDEPLOY_NAMESPACE}
This labeling ensures that only jobs associated with the current deployment are queried or cancelled, avoiding conflicts with other users or deployments.
Job Monitoring
The SDP Slurm Deployer tracks the status of jobs it submits to SLURM by querying at regular intervals.
Job states such as PENDING, RUNNING, COMPLETED, and FAILED are fetched and matched to deployments defined in
the SDP Configuration Database. This mapping allows the deployer to:
Update internal deployment state
Detect failed or missing jobs
Cancel jobs if a deployment is removed
Below is the mapping of SLURM job states to their corresponding SDP deployment states:
SLURM Job State |
SDP Deployment State |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Monitoring is handled in the DeploymentManager via the fetch_deployments_and_slurm_jobs() method,
with API interactions abstracted in SlurmService.
SDP Deployment
The SDP Slurm Deployer can be deployed in the SDP system by enabling it during the installation.
This allows the deployer to submit slurm jobs to the SLURM cluster as needed.
To enable the Slurm Deployer during installation, append the following --set arguments to your helm upgrade command:
--set slurmdeploy.enabled=true
For detailed instructions on installing the SDP, refer to the SDP Installation Guide.