ska_pst.testutils.stats

Submodule for STAT related code.

class ska_pst.testutils.stats.SampleStatistics(mean: float, variance: float, num_samples: int)[source]

Data class that models the statistics of a sample.

Variables

mean (float) – the mean of the sample
variance (float) – the variance of the sample
num_samples (float) – the number of samples used to calculate the statistics

mean: float

num_samples: int

variance: float

class ska_pst.testutils.stats.ScanStatFileWatcher(*args: Any, **kwargs: Any)[source]

Class to watch for when STAT file files are created.

Instances of this class watches a scan directory for real time monitoring STAT HDF5 files to be created and stores the events for later.

event_time_diffs() → List[StatFileEventDifference][source]: Get a list of differences between file creation events.

property events: List[StatFileCreatedEvent]: Get the list of file created events.

on_created(event: watchdog.events.FileSystemEvent) → None[source]

Handle an on created system event.

The event comes from watchdog and this method converts the event to a StatFileCreatedEvent instance and saves the event that can then later be retrieved from events.

stop() → None[source]: Stop watching for STAT files.

watch() → None[source]: Start watching for STAT files.

class ska_pst.testutils.stats.StatFileCreatedEvent(*, file_path: Path, create_datetime: float)[source]

Data class capturing a file creation event.

Variables

file_path (pathlib.Path) – the full path to the file that was created.
create_datetime (float) – the time, in seconds from epoch, when the file was created.

create_datetime: float

file_path: Path

class ska_pst.testutils.stats.StatFileEventDifference(*, first_file_event: InitVar[StatFileCreatedEvent], second_file_event: InitVar[StatFileCreatedEvent])[source]

A data class used to calculate differences in file creation events.

Variables

first_file_path (pathlib.Path) – the path to the file that was created first
second_file_path (pathlib.Path) – the path to the file that was created second
creation_time_difference (float) – the difference in creation time of the files

creation_time_difference: float

first_file_event: InitVar[StatFileCreatedEvent]

first_file_path: pathlib.Path

second_file_event: InitVar[StatFileCreatedEvent]

second_file_path: pathlib.Path

ska_pst.testutils.stats.assert_statistics(population_mean: float, population_var: float, sample_stats: SampleStatistics, channel: int, pol: int, tolerance: float = 6.0) → None[source]

Assert that sample mean and var are within a given tolerance of population stats.

Parameters

population_mean (float) – the mean of the population
population_var (float) – the variance of the population
sample_stats (SampleStatistics) – the samples statistics to assert against the population stats.
num_samples (int) – the sample size
channel (int) – the channel that is being tested
pol (int) – the polarisation that is being tested
tolerance (float, optional) – the number of sigma to allow being away from population value, defaults to 6.0

ska_pst.testutils.stats.assert_statistics_for_channels(channel_data: pandas.DataFrame, population_mean: float, population_var: float, pol: str, tolerance: float = 6.0) → None[source]

Assert that sample mean and var are within a given tolerance of population stats for each channel.

Parameters

channel_data (pd.DataFrame) – a data frame with statistics split by channel. This must include the following columns: “Mean”, “Var.”, “Num Samples”. This should also be specific for a given polarisation and complex data dimension (e.g. for Pol A real data).
population_mean (float) – the mean of the population
population_var (float) – the variance of the population
pol (str) – the polarisation to be tested, A or B
tolerance (float, optional) – the number of sigma to allow being away from population value, defaults to 6.0

ska_pst.testutils.stats.assert_statistics_for_digitised_data(data: numpy.ndarray, nbit: int, tolerance: float = 9.0) → None[source]

Assert that sample mean and var are within a given tolerance of population stats for TFP data.

This function asserts that the given Numpy array of data has a mean and variance within a given tolerance of the population mean and variance based on the number of bits used in the digitisation of the data.

Parameters

data (np.ndarray) – an array of either real or complex value floating point data.
nbit (int) – the number of bits used in the digitisation of the data.
tolerance (float, optional) – the number of sigma to allow being away from population value, defaults to 9.0

ska_pst.testutils.stats.generate_impulsive_rfi_stat_errors(*, nchan_out: int, polarisations: str, signal_config: SquareWaveConfig, use_robust_statistics: bool, num_samples: int, ndim: int = 2, **kwargs: Any) → pandas.DataFrame[source]

Generate the expected scale, offsets and error estimates for an impulsive RFI signal.

Using the Square Wave generator for create an impulsive RFI signal is done by setting the on signal intensity a lot higher than the normal off signal and provide a short on-duty cycle. Our testing using the on intensity 10 times that of the off intensity and a duty cycle of 10%.

Intensity in the square wave generator is the power and is given by I=Re^2 + Im^2 assuming 2 dimensions and the expected standard deviation of is given by stddev = sqrt(I / 2).

When non-robust statistics are used, it is assumed the output variance is a weighted average of the low and high variances based on the duty cycle: expected_var = (duty_cycle * high_variance) + (1 - duty_cycle) * low_variance.

For robust statistics, it is assumed that given a low duty cycle that the median/MAD calculation will just estimate the low/off pulse intensity before calculating the scales and offsets. The scales are calculated as 1 / (1.4826 * MAD).

Parameters

nchan_out (int) – number of channels in the output data.
polarisations (str) – the output polarisations to select.
signal_config (SquareWaveConfig) – the configuration used to generate the square wave signal.
use_robust_statistics (bool) – whether to calculate statistics based on median/MAD or mean/stddev.
num_samples (int) – number of samples out of the overall population.
ndim (int, optional) – the number of dimensions of the input signal, defaults to 2

Returns

a Pandas dataframe, index by channel and polarisation, of the calculated scales, offsets and the associated error estimates for both.

Return type

pd.DataFrame

ska_pst.testutils.stats.generate_ramped_signal_stat_errors(*, nchan_out: int, polarisations: str, use_robust_statistics: bool, num_samples: int, signal_config: SquareWaveConfig, **kwargs: Any) → pandas.DataFrame[source]

Generate the estimate and error for the scales and offsets of each chan/pol.

This will calculate the expected value of both the scale and offset for each channel and polarisation. It will also calculated the expected error in the values based on the fact that the values were calculated from a sample of a population of chan/pol statistics. Each chan/pol are assumed to be a normal distribution with a population mean of zero and a variance based on the intensity of each chan/pol that is defined in the square wave signal.

Parameters

nchan_out (int) – the number of output channels to select.
polarisations (str) – the output polarisations to select.
use_robust_statistics (bool) – indicator of whether robust statistics was used to generate the scales and offsets. If True this assumes median/MAD statistics was used.
num_samples (int) – the number of samples from the population used to create the scales and offsets values.
signal_config (SquareWaveConfig) – the configuration used to create the input square wave signal.

Returns

the expected scale and offset for each channel and polarisation along with the estimate of the error of the values.

Return type

pd.DataFrame

ska_pst.testutils.stats.generate_square_wave_stat_errors(*, nchan: int, polarisations: str, use_robust_statistics: bool, signal_config: SquareWaveConfig, num_samples: int, channels_out: Optional[Tuple[int, int]] = None, ndim: int = 2, **kwargs: Any) → pandas.DataFrame[source]

Generate a dataframe with expected values and errors of the scale and offsets from square wave signal.

This function will configuration of a square wave used in generating data and calculate the expected scales and offsets for each channel/polarisation of the output. It will also calculate the expected error of the sample of both scale and offset assuming that the input signal is a normal distribution with a known mean and standard deviation, as determined from the signal_config.

This function can be used for impulsive RFI signals where the on-pulse has a short duty. For this it the use_robust_statistics affects the expected output scale because median/MAD will not be affected by outliers and the expected population stats are based on the off-pulse intensity. When non-robust statistics are used, the expected variance of the population is a weighted average of on and off pulse intensities.

This function can also be used to calculate statistics for a known ramped signal where the off pulse intensity is assumed to be zero and the on-pulse intensity for each chan/pol can be determined by a linear fit intensity as chan 0 to chan N-1 (total of nchan=N) per polarisation.

Parameters

nchan (int) – the number of input channels from the CBF.
polarisations (str) – the output polarisations to select.
use_robust_statistics (bool) – whether to calculate statistics based on median/MAD or mean/stddev.
signal_config (SquareWaveConfig) – the configuration used to create the input square wave signal.
num_samples (int) – the number of samples used to calculate the sample scales and offset. This value will be used to calculate the expected variance/error of the computed scale and offset values.
channels_out (Tuple[int, int] | None, optional) – the expected output channels, defaults to None
ndim (int, optional) – the number of dimensions of the input data, defaults to 2

Raises

NotImplementedError – if unknown signal type is used.

Returns

a dataframe with expected values and errors of the scale and offsets from square wave signal.

Return type

pd.DataFrame