Database Guide

Running the application database

The application expects a PostgreSQL database to be available. The database needs the following environment variables to be set correctly:

  • POSTGRES_USER: The database user, which should have access to the database

  • POSTGRES_PASSWORD: The database’s user password.

  • POSTGRES_DB_NAME: The database name.

  • POSTGRES_HOST: The host that the database is available on. (It is vital that the container has access to this host)

  • POSTGRES_SCHEMA_NAME: The schema that is available to the user for the user.

Database schema management

The SkyComponent model dynamically generates columns from the dataclasses defined in ska-sdp-datamodels, while database-specific concerns (indexes, methods) are hardcoded for maintainability.

Schema architecture

The models.py file defines two models:

SkyComponent Model
  • Dynamically generated columns: Field names and types are read from the SkyComponent dataclass at module import time, ensuring automatic synchronization with upstream data model changes

  • Hardcoded database fields: The gsm_id field is explicitly defined for cross-table indexing and is not part of the scientific data model

GlobalSkyModelMetadata Model
  • Dynamically generated columns: Field names and types are read from the GlobalSkyModelMetadata dataclass at module import time

Dynamic column generation

The dynamic column generation happens at module import time through the _add_dynamic_columns_to_model() helper function:

  1. The function iterates over the dataclass’s __annotations__ dictionary

  2. Each field’s type annotation is converted to an appropriate SQLAlchemy column type

  3. Columns are added as attributes to the model class using setattr()

  4. Special handling ensures the name field is unique and not nullable

This approach runs once when the module is first imported. Subsequent imports use the cached module with fully-formed classes. When the upstream dataclass changes (e.g., new field added), the database model automatically includes that field on the next import—no code generation needed.

Schema management with Alembic

The database schema is being managed by Alembic

The alembic migration tool is available through the alembic command.

The file alembic/env.py is generated when initialising the tool, and has been extended to add functionality adapting to translate schema names as well as avoiding manual indexes.

Updating schema

The schema is migrated to the latest version by running alembic migrate.

There are 2 helper commands to do this:

  • make migrate - will migrate the database forwards

  • make migrate-rollback - will rollback a single migration

For both commands there are some variables that can be used:

  • RUN_LOCATION which can be local, docker or kubernetes which will run on one of the 3 environments.

Note that for the kubernetes commands you also need:

  • SDP_NAMESPACE - the namespace in which the GSM is deployed.

  • GSM_POD - the name of the GSM pod to use (use the API not the database)

Migrate a local instance
 make migrate
Migrate an instance using docker
 make migrate RUN_LOCATION=docker
Migrate a kubernetes instance
 KUBECONFIG=kubeconfig make migrate RUN_LOCATION=kubernetes SDP_NAMESPACE=dp-phoenix GSM_POD=ska-sdp-gsm-7954755b4f-vjxxz

Schema changes

The database model may change. In this case these changes need to be consolidated into a migration file (versions) and added to git. It is important that the environment is using the public schema in the PostgreSQL schema.

This file may be created manually and added to the folder alembic/versions/. Alternatively this file may be generated. This is done by running a helper command:

Create migration on local setup
 make migrate-create MIGRATION_NOTE="upgrading schema"
Create migration on a docker instance
 make migrate-create RUN_LOCATION=docker MIGRATION_NOTE="upgrading schema"

Note

Migrations cannot be created in kubernetes, as this would cause them to not be persisted in the repo.

Running migration in Kubernetes

When the GSM is deployed within a kubernetes environment the database will need to get the migrations run manually, there are 2 ways of doing this.

Migrating directly from GSM container

Running a console/shell into a running instance you can run:

$ bash /db_migrate.sh

If you would like to also have a sample dataset then run:

$ bash /db_migrate.sh --import-sample-data

If for some reason you need to downgrade the database (which shouldn’t be needed), you can run:

$ bash /db_downgrade.sh