Uploading a new version of GSM data
The GSM provides both a browser interface and API endpoints for uploading multiple sky survey catalogue files in a single atomic batch operation into the GSM database. The API is the recommended and primary method for uploading data. The browser-based interface is deprecated and will be removed in a future release.
The process allows the following:
Provide catalogue metadata via JSON file (required - includes name, description, and epoch)
Upload multiple CSV files simultaneously via API or browser interface.
CSV files uploaded in a single upload session will be part of the same catalogue version.
Track upload progress with a unique identifier.
Query upload status and errors.
Review the last few entries of the uploaded data, then manually commit or reject the upload.
Ensure atomic ingestion (all files succeed or none are ingested).
Automatic data validation at the schema level.
Batch uploads run asynchronously as background tasks. This design keeps the API responsive during large uploads and allows multiple concurrent batch operations.
A detailed user guide of the browser interface and the API can be found at Uploading GSM data.
Staging and versioning of data
Two-stage upload process
All uploads stage the data into a staging table first, which follows the schema of the
main tables, except it also includes an upload_id to distinguish between different
upload sessions.
Once the user commits the data, they are moved from the staging table into the
main table. When the upload is rejected, the data are removed form the staging
table and are not moved to the main one.
Catalogue Metadata File
Every upload must include a metadata.json file containing catalogue-level information that applies
to all components in the catalogue. The metadata follows the GlobalSkyModelMetadata dataclass format
from ska_sdp_datamodels package with a few additional fields annotating the catalogue.
Metadata File Format:
{
"catalogue_name": "GLEAM",
"description": "GaLactic and Extragalactic All-sky MWA Survey",
"epoch": "J2000",
"author": "GLEAM Team",
"reference": "https://doi.org/10.1093/mnras/stw2337",
"notes": "170 MHz continuum survey"
}
- Required Fields:
catalogue_name: Catalogue identifier (e.g., “GLEAM”, “RACS”, “RCAL”)description: Human-readable description of the catalogueepoch: Epoch of observation (e.g., “J2000”)
- Optional Fields:
author: Author or team namereference: DOI, URL, or citationnotes: Additional information
Files uploaded in a new session (new upload_id) will create a new catalogue version with its minor version number incremented from the last version of that catalogue in the database.
CSV file format
The uploaded CSV files must be compatible with the data models defined in the ska_sdp_datamodels package. The columns and data types need to match the models; if a column is not provided, the default will be loaded into the database.
Required columns:
component_id: Unique component identifier (string)ra_deg: Right ascension (J2000) in degrees (float)dec_deg: Declination (J2000) in degrees (float)i_pol_jy: I polarization flux at reference frequency in Janskys (float)ref_freq_hz: Reference frequency in Hz (float/integer)
Data Validation
Note
The API performs only basic technical validation (data types, required fields, coordinate ranges). No scientific validation is performed - users are responsible for ensuring their data are scientifically accurate.
After CSV files are loaded, each component undergoes validation. The following checks are performed:
Field |
Data Type |
Required |
Validation Checks |
|---|---|---|---|
|
string |
Yes |
Must be present, non-empty, and unique (across all files in batch) |
|
float |
Yes |
Must be numeric, range: 0 to 360 degrees |
|
float |
Yes |
Must be numeric, range: -90 to 90 degrees |
|
float |
Yes |
Must be numeric |
|
float |
Yes |
Must be numeric |
Each ingested component is validated individually. If any validation errors occur, no data will be ingested. Only if all components pass validation will ingestion proceed.