ska_pst.testutils.verification

Module for verification of data.

class ska_pst.testutils.verification.Metadata(scan_configuration: dict | str | pathlib.Path, scan_id: int, file_mount: pathlib.Path | str = PosixPath('/mnt/sdp/product'), scan_path: Optional[Path] = None, logger: Optional[Logger] = None)[source]

A class to provide a Pandas data frame to view all the different metadata.

Instances of this are created by passing a scan configuration, or the location of a JSON file of the configuration, the scan id and the option location of where to find files. Access to the dataframe is via the property dataframe.

viewer = Metadata(
    scan_configuration="/mnt/sdp/product/eb-j354-20240212-11115/pst-low/998/scan_configuration.json",
    scan_id=998,
    file_mount="/mnt/sdp/product"
)

df = viewer.dataframe

config_key_for_header_key(header_key: str) → str | None[source]: Get the config key given a header key.

config_value(key: str) → Optional[Any][source]: Get a config value for a given key.

property data_files: List[DadaFileReader]: Get all the data files for the scan.

property dataframe: pandas.DataFrame: Get all the scan’s metadata as a Pandas dataframe.

property metadata: dict: Get the DPD metadata file as a dict.

property metadata_file_path: Path: Get the file path of the DSP metadata for the file.

property subsystem_id: str: Get the DPD subsystem_id that the scan configuration is for.

property weights_files: List[DadaFileReader]: Get all the weights for the scan.

class ska_pst.testutils.verification.MetadataVerifier(scan_configuration: dict | str | pathlib.Path, scan_id: int, file_mount: pathlib.Path | str = PosixPath('/mnt/sdp/product'), logger: Optional[Logger] = None, **kwargs: Any)[source]

Class that can be used to verify the metadata of a scan is correct.

The following is an example of how to use this class.

# create an instance of a verifier
metadata_verifier = MetadataVerifier(
    scan_configuration="/mnt/sdp/product/eb-j354-20240212-11115/pst-low/998/scan_configuration.json",
    scan_id=998,
)

# perform a verification
try:
    metadata_verifier.verify()
except AssertionError as e:
    # handle error
    print(e)

# get a Pandas data frame of the metadata.
# If in a notebook then the dataframe can be displayed as a HTML table
df = metadata_verifier.dataframe

property dataframe: pandas.DataFrame: Get all the scan’s metadata as a Pandas dataframe.

verify() → None[source]

Verfiy the consistency of the metadata across data products.

This method will find all the files for the scan, including the DPD metadata file and then compare them. If there are any inconsistencies an AssertionError will be raised.

Raises: AssertionError

class ska_pst.testutils.verification.ObservationModeVerifier(*args, **kwargs)[source]

A Python protocol that abstracts over the verification of observation mode files.

Classes don’t have to extend from this directly but must implement the verify() method. The implementation should assert against the given configuration and files that are associated with the observation mode.

verify(metadata: Metadata, errors: List[str], **kwargs: Any) → None[source]

Perform verification for the given observation mode.

Parameters

metadata (Metadata) – the metadata loaded for the given scan
errors (List[str]) – the list of errors to put any validation errors into.

class ska_pst.testutils.verification.ValueMapping(*, config_key: Optional[str] = None, metadata_key: Optional[str] = None, file_key: Optional[str] = None)[source]

A data class to define the mapping between different metatdata sources.

config_key: str | None = None

Key to use to get value from scan configuration.

A value of None means there is no mapping.

config_value(config: dict) → Optional[Any][source]

Get the value from the scan configuration.

Parameters: config (dict) – a dictionary of the scan configuration.
Returns: the value in the dictionary or None if the value does not exist.
Return type: Any | None

file_key: str | None = None

Key to use to get value from a DADA file header.

A value of None means there is no mapping.

file_value(header: dict) → Optional[Any][source]

Get the value from a DADA file header.

Parameters: config (dict) – a dictionary of the DADA file header values.
Returns: the value in the dictionary or None if the value does not exist.
Return type: Any | None

metadata_key: str | None = None

Key to use to get value from the DPD metadata file.

A value of None means there is no mapping.

metadata_value(metadata: dict) → Optional[Any][source]

Get the value from the DPD metadata as a dictionary.

Parameters: config (dict) – a dictionary of the DPD metadata.
Returns: the value in the dictionary or None if the value does not exist.
Return type: Any | None

class ska_pst.testutils.verification.VoltageRecorderVerifier(logger: Optional[Logger] = None)[source]

A verifier for voltage recorder observation mode.

verify(metadata: Metadata, errors: List[str], **kwargs: Any) → None[source]

Perform verification for voltage recording mode.

Parameters

metadata (Metadata) – the metadata loaded for the given scan
errors (List[str]) – the list of errors to put any validation errors into.

ska_pst.testutils.verification.assert_bytes_per_second(**kwargs: Any) → None[source]

Assert that the BYTES_PER_SECOND header value is correct for the given file and configuration.

See calculate_bytes_per_second() for the details about calculation the expected BYTES_PER_SECOND value.

Raises: AssertionError – if BYTES_PER_SECOND is incorrect for file type and scan configuration.

ska_pst.testutils.verification.assert_const_value(expected_value: str | int | float) → Callable[[...], None][source]

Assert the header has a fixed/constant value.

This function converts the assert_header_value() into a partial function but sending the value argument as the expected_value.

Parameters: expected_value (str | int | float) – the constan
Returns: a callable to do the assertion.
Return type: Callable[…, None]

ska_pst.testutils.verification.assert_equal_to(other_key: str) → Callable[[...], None][source]

Assert that header is equal to another header value.

This function is a partial function that returns an assertion function that when called will check if the the header value is equal a value from another header key.

Parameters: other_key (str) – the header key of the other value to assert equality against.
Returns: a callable that the will assert the values of current header value and the other header value are the same.
Return type: Callable[…, None]

ska_pst.testutils.verification.assert_file_number(*, file: DadaFileReader, **kwargs: Any) → None[source]

Assert that the FILE_NUMBER header matches file name.

The FILE_NUMBER header should match the last part of the file name.

Parameters: file (DadaFileReader) – the current file that the FILE_NUMBER is being asserted
Raises: AssertionError – if FILE_NUMBER doesn’t match the file name.

ska_pst.testutils.verification.assert_header_value(*, file: DadaFileReader, header_key: str, expected_value: str | int | float, header_value: Optional[str] = None, logger: Optional[Logger] = None, **kwargs: Any) → None[source]

Assert that header value is equal to the expected value.

If the header_value is None then this method will get the value from the header the file.

This will assert the values are equal, but if the type of the expected value is float then this will use the Numpy method assert_allclose() to allow for rounding.

Parameters

file (DadaFileReader) – the data/weights file that is being verified
header_key (str) – the key to the header value to be asserted
expected_value (str | int | float) – the expected value
header_value (str | None, optional) – the value of the header to assert, defaults to None. If this value is None then the value will be retrieved from the file first.

Raises

AssertionError – if header value is not the same as expected value.

ska_pst.testutils.verification.assert_nant(*, scan_config: dict, **kwargs: Any) → None[source]

Assert that the NANT header value is correct for the given configuration.

This value should be the length of the receptors value in the scan configuration.

Parameters: scan_config (dict) – the scan configuration as a dictionary.
Raises: AssertionError – if NANT is incorrect.

ska_pst.testutils.verification.assert_nbit(*, scan_config: dict, is_weights: bool, **kwargs: Any) → None[source]

Asssert the NBIT value is correct for given file type and scan configuration.

For weights files this value should be 16. For data files this should be equal bits_per_sample // 2 from the scan configuration. The scan configuration passes through the overall number of bits for the data including the real and imaginary but in PST the NBIT value is per dimension.

Parameters

scan_config (dict) – the scan configuration as a dictionary.
is_weights (bool) – whether asserting against a data or weights file.

Raises

AssertionError – if NBIT is incorrect for file type and scan configuration.

ska_pst.testutils.verification.assert_ndim(*, is_weights: bool, **kwargs: Any) → None[source]

Asssert the NDIM value is correct for given file type.

For data files this is set to 2 (i.e. complex valued data) and for weights it is set to 1 (i.e. real valued data).

Parameters: is_weights (bool) – whether asserting against a data or weights file.
Raises: AssertionError – if NDIM is incorrect for file type.

ska_pst.testutils.verification.assert_npol(*, scan_config: dict, is_weights: bool, **kwargs: Any) → None[source]

Asssert the NPOL value is correct for given file type and scan configuration.

For weights files this value should be 1. For data files this should be equal to the value of num_of_polarizations from the scan configuration.

Parameters

scan_config (dict) – the scan configuration as a dictionary.
is_weights (bool) – whether asserting against a data or weights file.

Raises

AssertionError – if NPOL is incorrect for file type and scan configuration.

ska_pst.testutils.verification.assert_obs_offset(*, file: DadaFileReader, scan_config: dict, is_weights: bool, **kwargs: Any) → None[source]

Assert that the OBS_OFFSET header value for given file and configuration.

OBS_OFFSET is a multiple of the RESOLUTION value, but is determined by first using BYTES_PER_SECOND to find the number of bytes every 10 seconds. This value is rounded up to be a multiple of buffer size, itself being a multiple of RESOLUTION.

For weights files, OBS_OFFSET is based off the data OBS_OFFSET and is then scaled WEIGHTS_RESOLUTION/DATA_RESOLUTION.

The file name also includes the OBS_OFFSET and this method asserts that the value is correct.

Parameters

file (DadaFileReader) – the current file that the OBS_OFFSET is being asserted
scan_config (dict) – the scan configuration as a dictionary.
is_weights (bool) – whether the current file is a weights or data file.

Raises

AssertionError – if OBS_OFFSET is incorrect for the current file.

ska_pst.testutils.verification.assert_resolution(**kwargs: Any) → None[source]

Assert that the RESOLUTION header value is correct for the given file and configuration.

See calculate_resolution() for the details about calculation the expected RESOLUTION value.

Raises: AssertionError – if RESOLUTION is incorrect for file type and scan configuration.

ska_pst.testutils.verification.assert_tsamp(**kwargs: Any) → None[source]

Assert that the TSAMP header value is correct for the given file and configuration.

See calculate_tsamp() for the details about calculation the expected TSAMP value.

Raises: AssertionError – if TSAMP is incorrect for file type and scan configuration.

ska_pst.testutils.verification.assert_udp_format(*, scan_config: dict, **kwargs: Any) → None[source]

Assert that UDP_FORMAT value is correct for the frequency band in the scan.

This method checks what the frequency_band value is in the scan configuration and uses that to get the band configuration and compares the value against the header value.

Parameters: scan_config (dict) – the scan configuration to get frequency band from.
Raises: AssertionError – if udp format is incorrect.

ska_pst.testutils.verification.calculate_bytes_per_second(*, is_weights: bool, scan_config: dict, **kwargs: Any) → float[source]

Calculate the expected bytes per seconds given file type and scan configuration.

This calculates the expected number of bytes per seconds that each file should be generating. The number of bytes / sample is calculated based on the file type and this value is then divided by the tsamp value for the given file type. As tsamp is in microseconds there is a scale factor of 1e6 to ensure that the value is per second not per microsecond.

Parameters

is_weights (bool) – whether the current file is a weights or data file.
scan_config (dict) – the scan configuration as a dictionary.

Returns

the bytes per second for the given file type.

Return type

float

ska_pst.testutils.verification.calculate_resolution(*, is_weights: bool, scan_config: dict, **kwargs: Any) → int[source]

Calculate the RESOLUTION for a given file and scan configuration.

The RESOLUTION value is amount of bytes needed to get all the data for the channels NCHAN when the number of samples per channel per UDP packet is udp_nsamp. For weights files the value includes a floating point scale factor per packet.

Parameters

is_weights (bool) – whether the current file is a weights or data file.
scan_config (dict) – the scan configuration as a dictionary.

Returns

the expected RESOLUTION for the given file and scan configuration.

Return type

int

ska_pst.testutils.verification.calculate_tsamp(*, is_weights: bool, scan_config: dict, **kwargs: Any) → float[source]

Calculate the expected TSAMP for given file type and scan configuration.

This calculates the time for each sample, in microseconds.

For weights, the data tsamp is scaled by the number of samples per packet as the weights are valid for each sample in a packet.

Parameters

is_weights (bool) – whether the current file is a weights or data file.
scan_config (dict) – the scan configuration as a dictionary.

Returns

the time per sample in microseconds.

Return type

float

ska_pst.testutils.verification.split_values(dtype: Type[T], delimiter: str = ',') → Callable[[str], List[T]][source]

Split a delimited string into a list of values of type T.

The default delimiter is a comma but this could be overridden.

An example usage of this is the ANTENNAE (strings), ANT_WEIGHTS (floats), OS_FACTOR (ints separated by “/”)

Parameters

dtype (Type[T]) – the data type the individual types should be.
delimiter (str) – the delimiter to split values.

Returns

a callable that will convert a string into a list of values of Type[T]

Return type

Callable[[str], List[T]]

ska_pst.testutils.verification.to_si(unit: astropy.units.UnitBase) → Callable[[str], float][source]

Convert value to SI unit based on input unit.

Example is that FREQ and BW in the DADA files are in MHz but the SI is Hz.

Parameters: unit (u.UnitBase) – the unit the value is in (e.g. u.MHz)
Returns: a callable to will convert a string value into an float in SI quantity value.
Return type: Callable[[str], float]