Common Utils
Common utilities for SKA-Low science-ops data analysis.
Provides helpers to locate station lists and shared data directories, sort observation metadata tables, discover test execution directories for frequency sweeps and solar drift scans, and load HDF5 file metadata with per-file summaries suitable for reporting.
- ska_sci_ops_data_analysis.common_utils.get_test_execution_dirs_in_dates(dates: list[str], freq_sweep: bool = False, solar_drift: bool = False, **args: Any) dict[str, dict[str, dict[str, object]]]
Discover test-execution run directories within a date window.
Supported data types:
Frequency Sweep (LCO-12), under
FrequencySweepMultiple/Solar Drift Scan (LCO-66), under
AcquireBeamformed/
Each run directory is expected to be named
multiple_<YYYY-MM-DD>_<HHMMSS>and to contain per-station subdirectories. Station names are loaded to filter subdirectories.- Parameters:
dates – Inclusive date window as
[start, end]inYYYY-MM-DDformat.freq_sweep – If
True, include frequency sweep runs.solar_drift – If
True, include solar drift scan runs.args – Reserved for future options; ignored.
- Returns:
Mapping with optional keys
"freq_sweep_dirs"and/or"solar_drift_dirs". Each maps a run directory name to a record with keys:main_path(absolute path),dir_list(list[Path] of station subdirectories),Date(YYYY-MM-DD), andtime_start(HH:MM).
- ska_sci_ops_data_analysis.common_utils.is_date_in_range(date: ~.datetime.date, date_range: tuple[str, str]) bool
Evaluate if given date is within the date_range.
- Parameters:
date – Date to evaluate
date_range – 2-tuple of ISO-format date strings
- Returns:
True if date is between the two dates in date_range, false otherwise
- ska_sci_ops_data_analysis.common_utils.load_stations(allow_comments: bool = True) list[str]
Load station IDs from
<repo_root>/data/station_list.txt.Reads one station name per line from the project’s data directory. When
allow_commentsisTrue, lines beginning with#are ignored.Expected layout:
<repo_root>/ ├── src/ska_sci_ops_data_analysis/... └── data/station_list.txt
- Parameters:
allow_comments – If
True, ignore lines beginning with#.- Returns:
Station identifiers in file order.
- Raises:
FileNotFoundError – If the file is missing or exists but contains no usable station names.
- ska_sci_ops_data_analysis.common_utils.obs_hdf5_info_loader(input_directory: str | Path, compute_missing_channels: bool = False, compute_gaussian: bool = False, channels: int | Sequence[int] | None = None, plot_channels: list[int] | None = None, median_windows: Any | None = None) DataFrame
Load observation HDF5 files and build a per-file + summary table.
For each
.hdf5file found ininput_directory, the function reads start and end timestamps, derives a per-file “useful time” from the sample timestamp array, and can optionally compute:Missing channel ranges using power arrays (
polarization_0**2 + polarization_1**2).Mean Gaussian fit quality (
R²) per file via an external checker (gaussian_check_station), when enabled.
The final row named
"TOTAL"aggregates: earliest start, latest end, total useful time, optionally the union of missing channels, the mean Gaussian R², the number of files, and a booleanDataflag.- Parameters:
input_directory – Directory containing HDF5 files to scan.
compute_missing_channels – If
True, compute missing channels per file and on the summary row (requireschannels).compute_gaussian – If
True, compute the mean GaussianR²per file using an external function available on the import path.channels – Number of channels used when computing missing channels (e.g.,
384). Required whencompute_missing_channelsisTrue.plot_channels – Optional channel indices forwarded to the Gaussian checker.
median_windows – Optional median filter configuration.
- Returns:
One row per file plus a final
"TOTAL"row with the aggregate metrics. Columns include:file_name,ts_start_unix,ts_end_unix,ts_start_AWST,ts_end_AWST,ts_array,useful_time_s,useful_time_str,Missing channels,Gaussian R2,Number of files(TOTAL row), andData(TOTAL row).- Raises:
ValueError – If
compute_missing_channelsisTruebutchannelsisNone.
- ska_sci_ops_data_analysis.common_utils.sort_df_by_date_time_station(df: DataFrame, date_col: str = 'Date', time_col: str = 'time_start', station_col: str = 'Station') DataFrame
Sort observations by date, time, and alphanumeric station ID.
Station IDs like
s10-3are split into components (spiral arm letter, cluster integer, and in-cluster integer) to achieve a natural alphanumeric order. Original string formats ofdate_colandtime_colare preserved.- Parameters:
df – Input table containing at least the date, time, and station columns.
date_col – Name of the date column (string-formatted dates).
time_col – Name of the time column (string-formatted time/slot).
station_col – Name of the station identifier column (e.g.
"s10-3").
- Returns:
A new DataFrame sorted by date, time, and station order. Temporary sort-key columns are removed.