Uploading GSM data

The GSM provides both an API and browser interface for uploading multiple sky survey catalogue files in a single atomic batch operation into the GSM database. The API is the recommended and primary method for uploading data. The browser-based interface is deprecated and will be removed in a future release.

Base API URL

Access to the GSM API requires the user to know the base URL for where the GSM is installed. With the SDP installed on the DP cluster, the following applies:

Use the internal service DNS name:

http://ska-sdp-gsm.<KUBE_NAMESPACE>

Where <KUBE_NAMESPACE> is the SDP control namespace.

API Endpoints

Endpoint

Parameters

Description

POST /upload-sky-survey-batch

  • files: One or more CSV files containing standardized sky survey data - data type: list[File] - required: True

Upload and ingest CSV files

GET /upload-sky-survey-status/{upload_id}

  • upload_id: Unique identifier returned when the upload was initiated - data type: string (UUID) - required: True

Get upload status

GET /review-upload/{upload_id}

  • upload_id: Unique identifier returned when the upload was initiated - data type: string (UUID) - required: True

Review staged upload

POST /commit-upload/{upload_id}

  • upload_id: Unique identifier of the staged upload to commit - data type: string (UUID) - required: True

Commit Staged Upload

DELETE /reject-upload/{upload_id}

  • upload_id: Unique identifier of the staged upload to reject - data type: string (UUID) - required: True

Reject Staged Upload

Upload and ingest CSV files

Endpoint: POST /upload-sky-survey-batch

Upload and ingest one or more sky survey CSV files to the staging table.

Important

Data uploaded via this endpoint is NOT automatically added to the main database.

After a successful upload, you MUST:

  1. Review the staged data using GET /review-upload/{upload_id}

  2. Commit the data using POST /commit-upload/{upload_id}

If you do not commit the upload, the data will remain in staging and will not be visible in the GSM.

All files in the batch are combined into a single sky model. If any file fails validation or ingestion, the entire batch is rolled back.

Note

This has been tested inside the cluster, with file sizes up to 200MB (~1,000,000 rows), with no known issue. Using port forwarding and curl, a catalogue of 2GB or 10,000,000 rows was successfully uploaded.

The catalogue version is not supplied by the user — it is automatically assigned when the upload is committed (see Commit Staged Upload).

Parameter

Description

Data Type

Required

metadata_file

JSON file with catalogue metadata (catalogue_name, description, epoch, author, reference, notes).

File (JSON)

Yes

csv_files

One or more CSV files containing standardized sky survey data

list[File]

Yes

Response:

{
    "upload_id": "550e8400-e29b-41d4-a716-446655440000",
    "status": "uploading",
    "catalogue_name": "GLEAM"
}

The endpoint returns immediately with status “uploading”. Ingestion to staging table proceeds asynchronously in the background. Use the status endpoint to monitor completion, then review and commit.

Example Usage:

# Upload metadata and one or more CSV files
    curl -X POST "<GSM_API_URL>/upload-sky-survey-batch" \\
        -F "metadata_file=@metadata.json;type=application/json" \\
        -F "csv_files=@test_catalogue_1.csv;type=text/csv" \\
        -F "csv_files=@test_catalogue_2.csv;type=text/csv"

Python Example:

import requests
import time

url = "<GSM_API_URL>/upload-sky-survey-batch"

# Upload metadata and multiple CSV files
files = [
    ("metadata_file", ("metadata.json", open("metadata.json", "rb"), "application/json")),
    ("csv_files", ("test_catalogue_1.csv", open("test_catalogue_1.csv", "rb"), "text/csv")),
    ("csv_files", ("test_catalogue_2.csv", open("test_catalogue_2.csv", "rb"), "text/csv")),
]
response = requests.post(url, files=files)

result = response.json()
print(f"Upload ID: {result['upload_id']}")
print(f"Catalogue: {result['catalogue_name']}")
print(f"Status: {result['status']}")  # Will be "uploading"

# Poll for completion
status_url = f"{url.replace('/upload-sky-survey-batch', '')}/upload-sky-survey-status/{result['upload_id']}"
while True:
    status_response = requests.get(status_url)
    status_data = status_response.json()
    if status_data['state'] in ['completed', 'failed']:
        break
    time.sleep(2)

print(f"Final status: {status_data['state']}")

Get upload status

Endpoint: GET /upload-sky-survey-status/{upload_id}

Retrieve the current status of a sky survey batch upload.

Parameter

Description

Data Type

Required

upload_id

Unique identifier returned when the upload was initiated

string (UUID)

Yes

Response:

{
    "upload_id": "550e8400-e29b-41d4-a716-446655440000",
    "state": "completed",
    "total_files": 3,
    "uploaded_files": 3,
    "remaining_files": 0,
    "errors": []
}

Upload States:

  • pending: Upload created but not started

  • uploading: Files are being uploaded and validated

  • completed: All files uploaded and ingested successfully

  • failed: Upload failed (see errors field for details)

Example Usage:

curl "<GSM_API_URL>/upload-sky-survey-status/550e8400-e29b-41d4-a716-446655440000"

Python Example:

import requests
import time

upload_id = "550e8400-e29b-41d4-a716-446655440000"
url = f"<GSM_API_URL>/upload-sky-survey-status/{upload_id}"

while True:
    response = requests.get(url)
    status = response.json()

    print(f"State: {status['state']}")
    print(f"Progress: {status['uploaded_csv_files']}/{status['total_csv_files']}")

    if status['state'] in ['completed', 'failed']:
        break

    time.sleep(2)

if status['state'] == 'failed':
    print(f"Errors: {status['errors']}")

Review staged upload

Endpoint: GET /review-upload/{upload_id}

Review the status of the upload before committing to the main database. Returns total record count and the last 10 staged records to confirm all data loaded successfully.

Parameter

Description

Data Type

Required

upload_id

Unique identifier returned when the upload was initiated

string (UUID)

Yes

Response:

{
    "upload_id": "550e8400-e29b-41d4-a716-446655440000",
    "total_records": 200,
    "sample_range": "91-100",
    "sample": [
        {
            "component_id": "J025837+035057",
            "ra": 0.7793,
            "dec": 0.0672,
            "i_pol": 0.8354,
            "version": null
        }
    ],
    "metadata": {
        "version": null,
        "catalogue_name": "TEST_CATALOGUE_1",
        "description": "Test catalogue 1 for development and testing purposes",
        "upload_id": "550e8400-e29b-41d4-a716-446655440000",
        "epoch": "J2000",
        "author": "SKA SDP Team",
        "reference": null,
        "notes": "Sample test data for ska-sdp-global-sky-model",
        "staging": true
    }
}

Example Usage:

curl "<GSM_API_URL>/review-upload/550e8400-e29b-41d4-a716-446655440000"

Python Example:

import requests

upload_id = "550e8400-e29b-41d4-a716-446655440000"
url = f"<GSM_API_URL>/review-upload/{upload_id}"

response = requests.get(url)
review = response.json()

print(f"Total records: {review['total_records']}")
print(f"Sample data: {review['sample'][:3]}")  # First 3 records

Commit Staged Upload

Endpoint: POST /commit-upload/{upload_id}

Commit staged data to the main database. The catalogue version is automatically assigned at commit time by incrementing the minor version of the previous latest version for that catalogue (e.g. 0.1.00.2.0). If no prior version exists for the catalogue, 0.1.0 is used. Versioning is independent per catalogue name. All components in the upload receive the same new version. A record is created in the global_sky_model_metadata table with the version and upload information.

Parameter

Description

Data Type

Required

upload_id

Unique identifier of the staged upload to commit

string (UUID)

Yes

Response:

{
    "message": "success",
    "records_committed": 200,
    "upload_id": "550e8400-e29b-41d4-a716-446655440000"
    "version": "0.2.0",
    "catalogue_name": "Test catalogue"
}

Example Usage:

curl -X POST "<GSM_API_URL>/commit-upload/550e8400-e29b-41d4-a716-446655440000"

Python Example:

import requests

upload_id = "550e8400-e29b-41d4-a716-446655440000"
url = f"<GSM_API_URL>/commit-upload/{upload_id}"

response = requests.post(url)
result = response.json()

print(f"Committed {result['records_committed']} records")
print(f"Message: {result['message']}")

Reject Staged Upload

Endpoint: DELETE /reject-upload/{upload_id}

Reject and discard staged data. All records associated with this upload_id are permanently deleted from the staging table. The catalogue metadata associated with the upload_id is also removed from the metadata table.

Parameter

Description

Data Type

Required

upload_id

Unique identifier of the staged upload to reject

string (UUID)

Yes

Response:

{
    "message": "Upload rejected successfully",
    "records_deleted": 200,
    "upload_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example Usage:

curl -X DELETE "<GSM_API_URL>/reject-upload/550e8400-e29b-41d4-a716-446655440000"

Python Example:

import requests

upload_id = "550e8400-e29b-41d4-a716-446655440000"
url = f"<GSM_API_URL>/reject-upload/{upload_id}"

response = requests.delete(url)
result = response.json()

print(f"Rejected and deleted {result['records_deleted']} records")
print(f"Message: {result['message']}")

End-to-End Upload Workflow

A complete example of the intended workflow is provided here. See above for more information on individual steps.

  1. Upload files to staging:

    POST /upload-sky-survey-batch

  2. Poll for completion:

    GET /upload-sky-survey-status/{upload_id}

  3. Review staged data:

    GET /review-upload/{upload_id}

  4. Commit to main database:

    POST /commit-upload/{upload_id}

import requests
import time

# ------------------------------------------------------------------
# Configuration
# ------------------------------------------------------------------
base_url = "<GSM_API_URL>"  # e.g. http://ska-sdp-gsm.<KUBE_NAMESPACE>

metadata_path = "metadata.json"
csv_paths = ["test_catalogue_1.csv", "test_catalogue_2.csv"]

# ------------------------------------------------------------------
# 1. Upload files to staging
# ------------------------------------------------------------------
upload_url = f"{base_url}/upload-sky-survey-batch"

files = [
    ("metadata_file", ("metadata.json", open(metadata_path, "rb"), "application/json")),
]

for path in csv_paths:
    files.append(("csv_files", (path, open(path, "rb"), "text/csv")))

response = requests.post(upload_url, files=files)
result = response.json()

upload_id = result["upload_id"]

print(f"Upload ID: {upload_id}")
print(f"Catalogue: {result['catalogue_name']}")
print(f"Initial status: {result['status']}")
print("Data uploaded to staging. Waiting for ingestion to complete...")

# ------------------------------------------------------------------
# 2. Poll for upload completion
# ------------------------------------------------------------------
status_url = f"{base_url}/upload-sky-survey-status/{upload_id}"

while True:
    status_response = requests.get(status_url)
    status = status_response.json()

    print(
        f"State: {status['state']} | "
        f"Progress: {status['uploaded_files']}/{status['total_files']}"
    )

    if status["state"] in ["completed", "failed"]:
        break

    time.sleep(2)

if status["state"] == "failed":
    print("Upload failed!")
    print(f"Errors: {status['errors']}")
    exit(1)

print("Upload completed successfully.")

# ------------------------------------------------------------------
# 3. Review staged data
# ------------------------------------------------------------------
review_url = f"{base_url}/review-upload/{upload_id}"
review = requests.get(review_url).json()

print(f"\nRecords staged: {review['total_records']}")
print(f"Sample records: {review['sample'][:3]}")

# ------------------------------------------------------------------
# 4. Commit upload to main database
# ------------------------------------------------------------------
commit_url = f"{base_url}/commit-upload/{upload_id}"
commit_response = requests.post(commit_url).json()

print("\nCommit successful.")
print(f"Committed {commit_response['records_committed']} records")
print(f"Catalogue: {commit_response['catalogue_name']}")
print(f"Version: {commit_response['version']}")

Browser upload interface

Warning

The browser upload interface is deprecated and will be removed in a future release. Users should migrate to the API-based workflow described above.

A browser interface is available at the /upload endpoint (e.g., <GSM_API_URL>/upload).

  1. Navigate to <GSM_API_URL>/upload in your web browser (replace <GSM_API_URL> with your deployment URL)

Upload Interface

Initial upload interface screen

  1. Drag and drop CSV files, containing data for a single catalogue version, onto the upload zone (or click to browse)

Warning

A size limit of 10MB (total) for the selected files exists.

Files added to upload

Interface showing files selected for upload

  1. Click “Upload Files” to begin the upload

  2. Monitor the upload progress - status updates automatically

  3. Confirm the upload completed successfully and review the count of staged records

Upload completed

Interface showing files have been uploaded and staged

  1. Click “Commit to Database” to approve or “Reject and Discard” to cancel

Upload committed
Upload rejected

Confirm or reject uploaded data

The browser interface also provides:

  • Real-time status monitoring

  • Displays the auto-assigned version of the committed data

  • Displays errors if upload fails

The expected CSV format is described at CSV file format and examples are shown at CSV Format Examples.

CSV Format Examples

Standardized Format:

The test_catalogue_1.csv and test_catalogue_2.csv files in the test data directory demonstrate the required standardized format:

component_id,ra_deg,dec_deg,i_pol_jy,a_arcsec,b_arcsec,pa_deg,spec_idx,log_spec_idx
J025837+035057,44.656883,3.849425,0.835419,142.417,132.7302,3.451346,"[-0.419238,,,,]",False
J030420+022029,46.084633,2.341634,0.29086,137.107,134.2583,-0.666618,"[-1.074094,,,,]",False

These test catalogues contain 100 components each and are used throughout the test suite as reference examples.

Minimal Format:

At minimum, you need the four required columns:

component_id,ra_deg,dec_deg,i_pol_jy
J000001-350001,0.004,-35.0,0.25
J000002-350002,0.008,-35.1,0.23