.. _upload_design: Uploading a new version of GSM data ----------------------------------- The GSM provides both a browser interface and API endpoints for uploading multiple sky survey catalogue files in a single atomic batch operation into the GSM database. The API is the recommended and primary method for uploading data. The browser-based interface is **deprecated** and will be removed in a future release. The process allows the following: - Provide catalogue metadata via JSON file (required - includes name, description, and epoch) - Upload multiple CSV files simultaneously via API or browser interface. - CSV files uploaded in a single upload session will be part of the same catalogue version. - Track upload progress with a unique identifier. - Query upload status and errors. - Review the last few entries of the uploaded data, then manually commit or reject the upload. - Ensure atomic ingestion (all files succeed or none are ingested). - Automatic data validation at the schema level. Batch uploads run asynchronously as background tasks. This design keeps the API responsive during large uploads and allows multiple concurrent batch operations. A detailed user guide of the browser interface and the API can be found at :ref:`batch_upload`. Staging and versioning of data ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Two-stage upload process ........................ All uploads stage the data into a ``staging table`` first, which follows the schema of the main tables, except it also includes an ``upload_id`` to distinguish between different upload sessions. Once the user ``commits`` the data, they are moved from the staging table into the main table. When the upload is ``rejected``, the data are removed form the staging table and are not moved to the main one. Catalogue Metadata File ....................... Every upload must include a ``metadata.json`` file containing catalogue-level information that applies to all components in the catalogue. The metadata follows the ``GlobalSkyModelMetadata`` dataclass format from `ska_sdp_datamodels package `_ with a few additional fields annotating the catalogue. **Metadata File Format**: .. code-block:: json { "catalogue_name": "GLEAM", "description": "GaLactic and Extragalactic All-sky MWA Survey", "epoch": "J2000", "author": "GLEAM Team", "reference": "https://doi.org/10.1093/mnras/stw2337", "notes": "170 MHz continuum survey" } **Required Fields**: - ``catalogue_name``: Catalogue identifier (e.g., "GLEAM", "RACS", "RCAL") - ``description``: Human-readable description of the catalogue - ``epoch``: Epoch of observation (e.g., "J2000") **Optional Fields**: - ``author``: Author or team name - ``reference``: DOI, URL, or citation - ``notes``: Additional information Files uploaded in a new session (new ``upload_id``) will create a new catalogue version with its minor version number incremented from the last version of that catalogue in the database. .. _upload_csv_format: CSV file format ^^^^^^^^^^^^^^^ The uploaded CSV files must be compatible with the data models defined in the `ska_sdp_datamodels package `_. The columns and data types need to match the models; if a column is not provided, the default will be loaded into the database. Required columns: - ``component_id``: Unique component identifier (string) - ``ra_deg``: Right ascension (J2000) in degrees (float) - ``dec_deg``: Declination (J2000) in degrees (float) - ``i_pol_jy``: I polarization flux at reference frequency in Janskys (float) - ``ref_freq_hz``: Reference frequency in Hz (float/integer) Data Validation ^^^^^^^^^^^^^^^ .. note:: The API performs only basic technical validation (data types, required fields, coordinate ranges). No scientific validation is performed - users are responsible for ensuring their data are scientifically accurate. After CSV files are loaded, each component undergoes validation. The following checks are performed: .. list-table:: :widths: 20, 15, 15, 50 :header-rows: 1 * - Field - Data Type - Required - Validation Checks * - ``component_id`` - string - Yes - Must be present, non-empty, and unique (across all files in batch) * - ``ra_deg`` - float - Yes - Must be numeric, range: 0 to 360 degrees * - ``dec_deg`` - float - Yes - Must be numeric, range: -90 to 90 degrees * - ``i_pol_jy`` - float - Yes - Must be numeric * - ``ref_freq_hz`` - float - Yes - Must be numeric Each ingested component is validated individually. If any validation errors occur, no data will be ingested. Only if all components pass validation will ingestion proceed.