ska_pst.testutils.stats

Submodule for STAT related code.

class ska_pst.testutils.stats.SampleStatistics(mean: float, variance: float, num_samples: int)[source]

Data class that models the statistics of a sample.

Variables

mean (float) – the mean of the sample
variance (float) – the variance of the sample
num_samples (float) – the number of samples used to calculate the statistics

mean: float

num_samples: int

variance: float

class ska_pst.testutils.stats.ScanStatFileWatcher(*args: Any, **kwargs: Any)[source]

Class to watch for when STAT file files are created.

Instances of this class watches a scan directory for real time monitoring STAT HDF5 files to be created and stores the events for later.

event_time_diffs() → List[StatFileEventDifference][source]: Get a list of differences between file creation events.

property events: List[StatFileCreatedEvent]: Get the list of file created events.

on_created(event: watchdog.events.FileSystemEvent) → None[source]

Handle an on created system event.

The event comes from watchdog and this method converts the event to a StatFileCreatedEvent instance and saves the event that can then later be retrieved from events.

stop() → None[source]: Stop watching for STAT files.

watch() → None[source]: Start watching for STAT files.

class ska_pst.testutils.stats.StatFileCreatedEvent(*, file_path: Path, create_datetime: float)[source]

Data class capturing a file creation event.

Variables

file_path (pathlib.Path) – the full path to the file that was created.
create_datetime (float) – the time, in seconds from epoch, when the file was created.

create_datetime: float

file_path: Path

class ska_pst.testutils.stats.StatFileEventDifference(*, first_file_event: InitVar[StatFileCreatedEvent], second_file_event: InitVar[StatFileCreatedEvent])[source]

A data class used to calculate differences in file creation events.

Variables

first_file_path (pathlib.Path) – the path to the file that was created first
second_file_path (pathlib.Path) – the path to the file that was created second
creation_time_difference (float) – the difference in creation time of the files

creation_time_difference: float

first_file_event: InitVar[StatFileCreatedEvent]

first_file_path: pathlib.Path

second_file_event: InitVar[StatFileCreatedEvent]

second_file_path: pathlib.Path

ska_pst.testutils.stats.assert_statistics(population_mean: float, population_var: float, sample_stats: SampleStatistics, channel: int, pol: int, tolerance: float = 6.0) → None[source]

Assert that sample mean and var are within a given tolerance of population stats.

Parameters

population_mean (float) – the mean of the population
population_var (float) – the variance of the population
sample_stats (SampleStatistics) – the samples statistics to assert against the population stats.
num_samples (int) – the sample size
channel (int) – the channel that is being tested
pol (int) – the polarisation that is being tested
tolerance (float, optional) – the number of sigma to allow being away from population value, defaults to 6.0

ska_pst.testutils.stats.assert_statistics_for_channels(channel_data: pandas.DataFrame, population_mean: float, population_var: float, pol: str, tolerance: float = 6.0) → None[source]

Assert that sample mean and var are within a given tolerance of population stats for each channel.

Parameters

channel_data (pd.DataFrame) – a data frame with statistics split by channel. This must include the following columns: “Mean”, “Var.”, “Num Samples”. This should also be specific for a given polarisation and complex data dimension (e.g. for Pol A real data).
population_mean (float) – the mean of the population
population_var (float) – the variance of the population
pol (str) – the polarisation to be tested, A or B
tolerance (float, optional) – the number of sigma to allow being away from population value, defaults to 6.0

ska_pst.testutils.stats.assert_statistics_for_digitised_data(data: numpy.ndarray, nbit: int, tolerance: float = 9.0) → None[source]

Assert that sample mean and var are within a given tolerance of population stats for TFP data.

This function asserts that the given Numpy array of data has a mean and variance within a given tolerance of the population mean and variance based on the number of bits used in the digitisation of the data.

Parameters

data (np.ndarray) – an array of either real or complex value floating point data.
nbit (int) – the number of bits used in the digitisation of the data.
tolerance (float, optional) – the number of sigma to allow being away from population value, defaults to 9.0