Database Guide

Running the application database

The application expects a PostgreSQL database to be available. The database needs the following environment variables to be set correctly:

POSTGRES_USER: The database user, which should have access to the database
POSTGRES_PASSWORD: The database’s user password.
POSTGRES_DB_NAME: The database name.
POSTGRES_HOST: The host that the database is available on. (It is vital that the container has access to this host)
POSTGRES_SCHEMA_NAME: The schema that is available to the user for the user.

Database schema management

The SkyComponent model dynamically generates columns from the dataclasses defined in ska-sdp-datamodels, while database-specific concerns (indexes, methods) are hardcoded for maintainability.

Schema architecture

The models.py file defines two models:

SkyComponent Model

Dynamically generated columns: Field names and types are read from the SkyComponent dataclass at module import time, ensuring automatic synchronization with upstream data model changes
Hardcoded database fields: The gsm_id field is explicitly defined for cross-table indexing and is not part of the scientific data model

GlobalSkyModelMetadata Model

Dynamically generated columns: Field names and types are read from the GlobalSkyModelMetadata dataclass at module import time

Dynamic column generation

The dynamic column generation happens at module import time through the _add_dynamic_columns_to_model() helper function:

The function iterates over the dataclass’s __annotations__ dictionary
Each field’s type annotation is converted to an appropriate SQLAlchemy column type
Columns are added as attributes to the model class using setattr()
Special handling ensures the name field is unique and not nullable

This approach runs once when the module is first imported. Subsequent imports use the cached module with fully-formed classes. When the upstream dataclass changes (e.g., new field added), the database model automatically includes that field on the next import—no code generation needed.

Schema management with Alembic

The database schema is being managed by Alembic

The alembic migration tool is available through the alembic command.

The file alembic/env.py is generated when initialising the tool, and has been extended to add functionality adapting to translate schema names as well as avoiding manual indexes.

Updating schema

The schema is migrated to the latest version by running alembic migrate.

There are 2 helper commands to do this:

make migrate - will migrate the database forwards
make migrate-rollback - will rollback a single migration

For both commands there are some variables that can be used:

RUN_LOCATION which can be local, docker or kubernetes which will run on one of the 3 environments.

Note that for the kubernetes commands you also need:

SDP_NAMESPACE - the namespace in which the GSM is deployed.
GSM_POD - the name of the GSM pod to use (use the API not the database)

Migrate a local instance

 make migrate

Migrate an instance using docker

 make migrate RUN_LOCATION=docker

Migrate a kubernetes instance

 KUBECONFIG=kubeconfig make migrate RUN_LOCATION=kubernetes SDP_NAMESPACE=dp-phoenix GSM_POD=ska-sdp-gsm-7954755b4f-vjxxz

Schema changes

The database model may change. In this case these changes need to be consolidated into a migration file (versions) and added to git. It is important that the environment is using the public schema in the PostgreSQL schema.

This file may be created manually and added to the folder alembic/versions/. Alternatively this file may be generated. This is done by running a helper command:

Create migration on local setup

 make migrate-create MIGRATION_NOTE="upgrading schema"

Create migration on a docker instance

 make migrate-create RUN_LOCATION=docker MIGRATION_NOTE="upgrading schema"

Note

Migrations cannot be created in kubernetes, as this would cause them to not be persisted in the repo.

Running migration in Kubernetes

When the GSM is deployed within a kubernetes environment the database will need to get the migrations run manually, there are 2 ways of doing this.

Migrating directly from GSM container

Running a console/shell into a running instance you can run:

$ bash /db_migrate.sh

If you would like to also have a sample dataset then run:

$ bash /db_migrate.sh --import-sample-data

If for some reason you need to downgrade the database (which shouldn’t be needed), you can run:

$ bash /db_downgrade.sh