Python Data Access Library

The data files produced by SKA PST STAT are HDF5 files but as structured in a way that is not easily used by a person that is unfamilar with the file format. The SKA PST STAT project provides a Python library ska_pst_stat that can be used in a Jupyter notebook and exposes the data with easy to use properties.

ska_pst_stat

This module for any Python related code for STAT.

The core class in this module is the Statistics class which can be used to load a SKA PST STAT HDF5 file and provide ease of use access to the underlying property.

This package depends on Pandas, Numpy and H5Py and the dependencies should get installed automatically when installing this package.

The following code snippet demonstrates how to load a file and get the header metadata of the file and plot a spectrogram.

from ska_pst_stat import Statistics
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

file_path = "/tmp/some-path-to-file/stat.h5"
stats = Statistics.load_from_file(file_path)

# this returns a Pandas DataFrame
header = stats.header

# to plot the spectrogram for polarisation A
plt.imshow(stats.pol_a_spectrogram)
plt.show()

class ska_pst_stat.Statistics(*, metadata: ska_pst_stat.hdf5.model.StatisticsMetadata, data: ska_pst_stat.hdf5.model.StatisticsData)[source]

Data class used to abstract over HDF5 file.

Instances of this should be created by passing the location of a STAT file to the load_from_file() method.

property channel_numbers: nptyping.NDArray.(typing.Literal['NChan'], nptyping.Int): Get an array of channel numbers.

property frequency_averaged_stats: pandas.DataFrame

Get the frequency averaged statistics for all frequencies.

This returns the mean and variance of all the data across all frequencies, including frequencies marked as having RFI, separated for each polarisation and complex value dimension. The statistics also includes the number of samples clipped (i.e. the digital value was at the min or max value given the number of bits.)

The data frame has the following columns:

Polarisation - which polarisation that the statistic value is for.

Dimension - which complex dimension/component (i.e. real or imag) that the statistic is for.

Mean - the mean of the data for each polarisation and dimension, averaged over all channels.

Variance - the variance of the data for each polarisation and dimension, averaged over all channels.

Clipped - number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

The Pandas frame has a MultiIndex key using the Polarisation, and Dimension columns.

property frequency_averaged_stats_rfi_excised: pandas.DataFrame

Get the frequency averaged statistics from all channels not flagged for RFI.

This returns the mean and variance of all the data across all channels, expect those flagged for RFI, separated for each polarisation and complex value dimension. The statistics also includes the number of samples clipped (i.e. the digital value was at the min or max value given the number of bits.)

The data frame has the following columns:

Polarisation - which polarisation that the statistic value is for.

Dimension - which complex dimension/component (i.e. real or imag) that the statistic is for.

Mean - the mean of the data for each polarisation and dimension, averaged over all channels.

Variance - the variance of the data for each polarisation and dimension, averaged over all channels.

Clipped - number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

The Pandas frame has a MultiIndex key using the Polarisation, and Dimension columns.

property frequency_bins: nptyping.NDArray.(typing.Literal['NFreqBin'], nptyping.Float64): Get the frequency bins used in the spectrogram data.

get_channel_stats() → pandas.DataFrame[source]

Get the channel statistics.

While this method is a public method, it is recommened to use the following properties as they provide more specific access to the data based on polarisation and specific dimension of the complex voltage data.

pol_a_channel_stats

pol_b_channel_stats

pol_a_real_channel_stats

pol_a_imag_channel_stats

pol_b_real_channel_stats

pol_b_imag_channel_stats

The data frame has the following columns:

Channel - the channel number the statistics are for.

Polarisation - which polarisation that the statistic value is for.

Dimension - which complex dimension/component (i.e. real or imag) that the statistic is for.

Channel Freq. (MHz) - the centre frequency for the channel.

Mean - the mean of the data for each polarisation and dimension, averaged over all channels.

Variance - the variance of the data for each polarisation and dimension, averaged over all channels.

Clipped - number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

The Pandas frame has a MultiIndex key using the Channel, Polarisation, and Dimension columns.

Returns: a data frame with statistics for each channel split by polarisation and complex voltage dimension.
Return type: pd.DataFrame

get_frequency_averaged_stats() → pandas.DataFrame[source]

Get the frequency averaged statistics.

This will return a data frame that includes statistics across all frequencies/channels as well as only the frequencies/channels that weren’t marked as having RFI.

While this method is a public method, it is recommened to use the following properties directly:

frequency_averaged_stats

frequency_averaged_stats_rfi_excised

The data frame has the following columns:

Polarisation - which polarisation that the statistic value is for.

Dimension - which complex dimension/component (i.e. real or imag) that the statistic is for.

RFI Excised - a boolean value of whether the statistic after RFI had been excised.

Mean - the mean of the data for each polarisation and dimension, averaged over all channels.

Variance - the variance of the data for each polarisation and dimension, averaged over all channels.

Clipped - number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

The Pandas frame has a MultiIndex key using the Polarisation, Dimension, and RFI Excised columns.

get_histogram_data(rfi_excised: bool) → pandas.DataFrame[source]

Get the histogram of the input data integer states for each polarisation and dimension.

While this method is a public method, it is recommened to use one of the following 8 properties as they provide the data in a more usable format:

pol_a_real_histogram

pol_a_imag_histogram

pol_b_real_histogram

pol_b_imag_histogram

pol_a_real_histogram_rfi_excised

pol_a_imag_histogram_rfi_excised

pol_b_real_histogram_rfi_excised

pol_b_imag_histogram_rfi_excised

The number of bins in the histogram is 2^(number of bits). For 8 bit data this is 256 bins and for 16 bit data this is 65536 bins.

The data frame has the following columns:

Bin - the bin for the histogram count.

Polarisation - which polarisation that the statistic value is for.

Dimension - which complex dimension/component (i.e. real or imag) that the statistic is for.

Count - the number/count for the bin.

The Pandas frame has a MultiIndex key using the Bin, Polarisation, and Dimension columns.

Parameters: rfi_excised (True) – a bool value to report on all (False) or RFI excised (True) data
Returns: a data frame for histogram data split polarisation and complex voltage dimension.
Return type: pd.DataFrame

get_rebinned_histogram2d_data(rfi_excised: bool, polarisation: ska_pst_stat.hdf5.consts.Polarisation) -> nptyping.NDArray.(typing.Literal['NRebin, NRebin'], nptyping.UInt32)[source]

Get the 2D histogram data.

This returns a Numpy array rather than a Pandas Dataframe.

While this is a public method the following properties should be used as they provide a more user friendly API.

pol_a_rebinned_histogram2d

pol_b_rebinned_histogram2d

pol_a_rebinned_histogram2d_rfi_excised

pol_b_rebinned_histogram2d_rfi_excised

Parameters

rfi_excised (bool) – use the RFI excised data (True) or all data (False)
polarisaion – which polarisation of the data to use.

get_rebinned_histogram_data(rfi_excised: bool) → pandas.DataFrame[source]

Get rebinned histogram data.

While this method is a public method, it is recommened to use one of the following 8 properties as they provide the data in a more usable format.

pol_a_real_rebinned_histogram

pol_a_imag_rebinned_histogram

pol_b_real_rebinned_histogram

pol_b_imag_rebinned_histogram

pol_a_real_rebinned_histogram_rfi_excised

pol_a_imag_rebinned_histogram_rfi_excised

pol_b_real_rebinned_histogram_rfi_excised

pol_b_imag_rebinned_histogram_rfi_excised

The number of bins that the data has been rebinned to is Num. Histogram Bins (Rebinned) value found in the header.

The data frame has the following columns:

Bin - the bin for the histogram count.

Polarisation - which polarisation that the statistic value is for.

Dimension - which complex dimension/component (i.e. real or imag) that the statistic is for.

Count - the number/count for the bin

The Pandas frame has a MultiIndex key using the Bin, Polarisation, and Dimension columns.

Parameters: rfi_excised (True) – a bool value to report on all (False) or RFI excised (True) data
Returns: a data frame for the rebinned histogram data split polarisation and complex voltage dimension.
Return type: pd.DataFrame

get_spectral_power() → pandas.DataFrame[source]

Get the mean and max spectral power values for each channel.

This data frame includes both the mean and max of the spectral power for each channel for all polarisations.

The following properties are provided for each polarisation:

pol_a_spectral_power

pol_b_spectral_power

The data frame has the following columns:

Polarisation - which polarisation that the statistic value is for.

Channel - the channel number the statistics are for.

Mean - the mean of the spectral power for the current channel.

Max - the maximum of the spectral power for the current channel over the time sample of the statistics file.

The Pandas frame has a MultiIndex key using the Polarisation, and Channel columns.

Returns: the mean and max spectral power values for each channel.
Return type: pd.DataFrame

get_timeseries_data(rfi_excised: bool) → pandas.DataFrame[source]

Get the timeseries data.

While this is a public method, the following properties should be used as they provide a more user friendly access to the data.

pol_a_timeseries

pol_b_timeseries

pol_a_timeseries_rfi_excised

pol_b_timeseries_rfi_excised

The timeseries is binned in time (see Num. Temporal Bins in header) and is summed over all frequencies. If rfi_excised is True then the summing happens over the frequency that are not RFI excised.

The data frame has the following columns:

Polarisation - which polarisation that the statistic value is for.

Temporal Bin - the time bin.

Time Offset - the offset, in seconds, for the current temporal bin.

Max - the maximum power recorded in the temporal bin.

Min - the minimum power recorded in the temporal bin.

Mean - the mean power recorded in the temporal bin.

The Pandas frame has a MultiIndex key using the Polarisation, and Temporal Bin columns.

Parameters: rfi_excised (bool) – whether to use all frequencies (False) or those that are not marked as having RFI.
Returns: a data frame with the timeseries statistics.
Return type: pd.DataFrame

property header: pandas.DataFrame

Get the header metadata for the data file.

This returns a Pandas data frame of the header data from the HDF5 file. This the user of the API to see what is in the HEADER dataset without the need of using a HDF5 view tool

The header has the following fields:

Key	Example	Description
File Format Version	1.0.0	the version of the SKA PST STAT file format that the file is from.
Execution Block ID	eb-m001-20230921-245	the execution block ID of the generated data file
Telescope	SKALow	the telescope used for the generated data file (i.e. SKALow or SKAMid)
Scan ID	42	the ID of the scan that the file was generated from.
Beam ID	1	the PST BEAM ID that was used for the scan
UTC Start Time	2023-10-23-11:00:00	an ISO formated string of the UTC time at the start of the scan
Start Scan Offset	0.0	the time offset, in seconds, from the UTC start time to represent the time at the start of the data in the file.
End Scan Offset	0.106168	the time offset, in seconds, from the UTC start time to represent the time at the end of data in the file.
Frequency (MHz)	87.5	the centre frequency for the data as a whole
Bandwidth (MHz)	75.0	the bandwidth of data
Start Channel Number	0	the starting channel number
End Channel Number	431	the last channel that the data is for
Num. Polarisations	2	number of polarisations, this should be 2
Num. Dimensions	2	number of dimensions in the data (should be 2 for complex data)
Num. Channels	432	number of channels in the data
Num. Frequency Bins	36	he number of frequency bins in the spectrogram data
Num. Temporal Bins	32	the number of temporal bins in the spectrogram and timeseries data
Num. Histogram Bins	65536	the number of bins in the histogram data
Num. Histogram Bins (Rebinned)	256	number of bins to used in the rebinned histograms
Num. Samples	21012480	total number of samples used to calculate statistics
Num. Samples (RFI Excised)	19456000	total number of samples used to calculate statistics, excluding RFI excised data
Num. Invalid Packets	0	total number invalid/dropped packets in the data used to calculate statistics.

Returns: a human readable version of the header scalar fields.
Return type: pd.DataFrame

static load_from_file(file_path: pathlib.Path | str) → ska_pst_stat.stats.Statistics[source]

Load a HDF5 STAT file and return an instance of the Statistics class.

Parameters: file_path (pathlib.Path | str) – the path to the file to load the statistics from
Returns: the statistics from the HDF5 file as a Python class
Return type: Statistics

property nchan: int: Get the number of channels for the voltage data.

property ndim: int

Get the number of dimensions of voltage data.

This value should be 2 as SKAO uses complex voltage data and the statistics has real and imaginary dimensions.

property npol: int: Get the number of polarisations.

property pol_a_channel_stats: pandas.DataFrame

Get the polarisation A channel statistics.

This property includes both the real and complex dimension of the data. The following utility properties are provided to get the statistics of each dimension directly:

pol_a_real_channel_stats

pol_a_imag_channel_stats

The data frame has the following columns:

Channel - the channel number the statistics are for.

Dimension - which complex dimension/component (i.e. real or imag) that the statistic is for.

Channel Freq. (MHz) - the centre frequency for the channel.

Mean - the mean of the data for each polarisation and dimension, averaged over all channels.

Variance - the variance of the data for each polarisation and dimension, averaged over all channels.

Clipped - number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

The Pandas frame has a MultiIndex key using the Channel, and Dimension columns.

Returns: a data frame of polarisation A with statistics for each channel split complex voltage dimension.
Return type: pd.DataFrame

property pol_a_imag_channel_stats: pandas.DataFrame

Get the imaginary valued, polarisation A channel statistics.

The data frame has the following columns:

Channel - the channel number the statistics are for.

Channel Freq. (MHz) - the centre frequency for the channel.

Mean - the mean of the data for each polarisation and dimension, averaged over all channels.

Variance - the variance of the data for each polarisation and dimension, averaged over all channels.

Clipped - number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

Returns: a data frame of the imaginary component of polarisation A with statistics for each channel.
Return type: pd.DataFrame

property pol_a_imag_histogram: pandas.DataFrame

Get the histogram of the imaginary valued, polarisation A, input data integer states.

The number of bins in the histogram is 2^(number of bits). For 8 bit data this is 256 bins and for 16 bit data this is 65536 bins.

The data frame has the following columns:

Bin - the bin for the histogram count.

Count - the number/count for the bin

Returns: a data frame for histogram data for imaginary valued, polarisation A, voltage data.
Return type: pd.DataFrame

property pol_a_imag_histogram_rfi_excised: pandas.DataFrame

Get the histogram of the imag valued, pol A, input data from all channels not flagged for RFI.

The number of bins in the histogram is 2^(number of bits). For 8 bit data this is 256 bins and for 16 bit data this is 65536 bins.

The data frame has the following columns:

Bin - the bin for the histogram count.

Count - the number/count for the bin

Returns: a data frame for histogram data for imaginary valued, polarisation A, voltage data from all channels not flagged for RFI.
Return type: pd.DataFrame

property pol_a_imag_rebinned_histogram: pandas.DataFrame

Get the rebinned histogram of the imaginary valued, pol A.

The data frame has the following columns:

Bin - the bin for the histogram count.

Count - the number/count for the bin

Returns: a data frame for rebinned histogram data for imaginary valued, polarisation A.
Return type: pd.DataFrame

property pol_a_imag_rebinned_histogram_rfi_excised: pandas.DataFrame

Get the rebinned histogram of the imag valued, pol A except those flagged with RFI.

The data frame has the following columns:

Bin - the bin for the histogram count.

Count - the number/count for the bin

Returns: a data frame for rebinned histogram data for imaginary valued, polarisation A except those flagged with RFI.
Return type: pd.DataFrame

property pol_a_real_channel_stats: pandas.DataFrame

Get the real valued, polarisation A channel statistics.

The data frame has the following columns:

Channel - the channel number the statistics are for.

Channel Freq. (MHz) - the centre frequency for the channel.

Mean - the mean of the data for each polarisation and dimension, averaged over all channels.

Variance - the variance of the data for each polarisation and dimension, averaged over all channels.

Clipped - number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

Returns: a data frame of the real component of polarisation A with statistics for each channel.
Return type: pd.DataFrame

property pol_a_real_histogram: pandas.DataFrame

Get the histogram of the real valued, polarisation A, input data integer states.

The number of bins in the histogram is 2^(number of bits). For 8 bit data this is 256 bins and for 16 bit data this is 65536 bins.

The data frame has the following columns:

Bin - the bin for the histogram count.

Count - the number/count for the bin

Returns: a data frame for histogram data for real valued, polarisation A, voltage data.
Return type: pd.DataFrame

property pol_a_real_histogram_rfi_excised: pandas.DataFrame

Get the histogram of the real valued, pol A, input data from all channels not flagged for RFI.

The number of bins in the histogram is 2^(number of bits). For 8 bit data this is 256 bins and for 16 bit data this is 65536 bins.

The data frame has the following columns:

Bin - the bin for the histogram count.

Count - the number/count for the bin

Returns: a data frame for histogram data for real valued, polarisation A, voltage data from all channels not flagged for RFI.
Return type: pd.DataFrame

property pol_a_real_rebinned_histogram: pandas.DataFrame

Get the rebinned histogram of the real valued, pol A.

The data frame has the following columns:

Bin - the bin for the histogram count.

Count - the number/count for the bin

Returns: a data frame for rebinned histogram data for real valued, polarisation A.
Return type: pd.DataFrame

property pol_a_real_rebinned_histogram_rfi_excised: pandas.DataFrame

Get the rebinned histogram of the real valued, pol A except those flagged with RFI.

The data frame has the following columns:

Bin - the bin for the histogram count.

Count - the number/count for the bin

Returns: a data frame for rebinned histogram data for real valued, polarisation A except those flagged with RFI.
Return type: pd.DataFrame

property pol_a_rebinned_histogram2d: nptyping.NDArray.(typing.Literal['NRebin, NRebin'], nptyping.UInt32)

Get the rebinned 2D histogram data for polarisation A.

This returns a Numpy array with data for all frequencies.

the first array dimension is the real valued data.

the second array dimension is the imaginary valued data.

Returns: the rebinned 2D histogram data for polarisation A.
Return type: np.ndarray

property pol_a_rebinned_histogram2d_rfi_excised: nptyping.NDArray.(typing.Literal['NRebin, NRebin'], nptyping.UInt32)

Get the rebinned 2D histogram data for polarisation A except frequencies flagged with RFI.

This returns a Numpy array with data for frequnecies that aren’t RFI excised.

the first array dimension is the real valued data.

the second array dimension is the imaginary valued data.

Returns: the rebinned 2D histogram data for polarisation A.
Return type: np.ndarray

property pol_a_spectral_power: pandas.DataFrame

Get the mean and max spectral power values for each channel for polarisation A.

The data frame has the following columns:

Channel - the channel number the statistics are for.

Mean - the mean of the spectral power for the current channel.

Max - the maximum of the spectral power for the current channel over the time sample of the statistics file.

Returns: the mean and max spectral power values for each channel for polarisation A.
Return type: pd.DataFrame

property pol_a_spectrogram: nptyping.NDArray.(typing.Literal['NFreqBin, NTimeBin'], nptyping.Float32)

Get the spectrogram data for polarisation A.

This returns a Numpy array that can be used with Matplotlib to plot a Spectrogram. The data in the spectogram in binned by channel and within time (see Num. Frequency Bins, Num. Temporal Bins in header for more details.)

Returns: the spectrogram data for polarisation A.
Return type: np.ndarray

property pol_a_timeseries: pandas.DataFrame

Get the timeseries data for polarisation A for all frequencies.

The timeseries is binned in time (see Num. Temporal Bins in header) and is summed over all frequencies.

The data frame has the following columns:

Temporal Bin - the time bin.

Time Offset - the offset, in seconds, for the current temporal bin.

Max - the maximum power recorded in the temporal bin.

Min - the minimum power recorded in the temporal bin.

Mean - the mean power recorded in the temporal bin.

Returns: a data frame with the timeseries statistics for polarisation A.
Return type: pd.DataFrame

property pol_a_timeseries_rfi_excised: pandas.DataFrame

Get the timeseries data for polarisation A for all frequencies except for RFI excised frequencies.

The timeseries is binned in time (see Num. Temporal Bins in header) and is summed over all frequencies.

The data frame has the following columns:

Temporal Bin - the time bin.

Time Offset - the offset, in seconds, for the current temporal bin.

Max - the maximum power recorded in the temporal bin.

Min - the minimum power recorded in the temporal bin.

Mean - the mean power recorded in the temporal bin.

Returns: a data frame with the timeseries statistics for polarisation B except for frequencies that have been RFI excised.
Return type: pd.DataFrame

property pol_b_channel_stats: pandas.DataFrame

Get the polarisation B channel statistics.

This property includes both the real and complex dimension of the data. The following utility properties are provided to get the statistics of each dimension directly:

pol_b_real_channel_stats

pol_b_imag_channel_stats

The data frame has the following columns:

Channel - the channel number the statistics are for.

Channel Freq. (MHz) - the centre frequency for the channel.

Mean - the mean of the data for each polarisation and dimension, averaged over all channels.

Variance - the variance of the data for each polarisation and dimension, averaged over all channels.

Clipped - number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

The Pandas frame has a MultiIndex key using the Channel, and Dimension columns.

Returns: a data frame of polarisation B with statistics for each channel split complex voltage dimension.
Return type: pd.DataFrame

property pol_b_imag_channel_stats: pandas.DataFrame

Get the imaginary valued, polarisation B channel statistics.

The data frame has the following columns:

Channel - the channel number the statistics are for.

Channel Freq. (MHz) - the centre frequency for the channel.

Mean - the mean of the data for each polarisation and dimension, averaged over all channels.

Variance - the variance of the data for each polarisation and dimension, averaged over all channels.

Clipped - number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

Returns: a data frame of the imaginary component of polarisation B with statistics for each channel.
Return type: pd.DataFrame

property pol_b_imag_histogram: pandas.DataFrame

Get the histogram of the imaginary valued, polarisation B, input data integer states.

The number of bins in the histogram is 2^(number of bits). For 8 bit data this is 256 bins and for 16 bit data this is 65536 bins.

The data frame has the following columns:

Bin - the bin for the histogram count.

Count - the number/count for the bin

Returns: a data frame for histogram data for imaginary valued, polarisation B, voltage data.
Return type: pd.DataFrame

property pol_b_imag_histogram_rfi_excised: pandas.DataFrame

Get the histogram of the imag valued, pol B, input data from all channels not flagged for RFI.

The number of bins in the histogram is 2^(number of bits). For 8 bit data this is 256 bins and for 16 bit data this is 65536 bins.

The data frame has the following columns:

Bin - the bin for the histogram count.

Count - the number/count for the bin

Returns: a data frame for histogram data for imaginary valued, polarisation B, voltage data from all channels not flagged for RFI.
Return type: pd.DataFrame

property pol_b_imag_rebinned_histogram: pandas.DataFrame

Get the rebinned histogram of the imaginary valued, pol B.

The data frame has the following columns:

Bin - the bin for the histogram count.

Count - the number/count for the bin

Returns: a data frame for rebinned histogram data for imaginary valued, polarisation B.
Return type: pd.DataFrame

property pol_b_imag_rebinned_histogram_rfi_excised: pandas.DataFrame

Get the rebinned histogram of the imag valued, pol B except those flagged with RFI.

The data frame has the following columns:

Bin - the bin for the histogram count.

Count - the number/count for the bin

Returns: a data frame for rebinned histogram data for imaginary valued, polarisation B except those flagged with RFI.
Return type: pd.DataFrame

property pol_b_real_channel_stats: pandas.DataFrame

Get the real valued, polarisation B channel statistics.

The data frame has the following columns:

Channel - the channel number the statistics are for.

Channel Freq. (MHz) - the centre frequency for the channel.

Mean - the mean of the data for each polarisation and dimension, averaged over all channels.

Variance - the variance of the data for each polarisation and dimension, averaged over all channels.

Clipped - number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

Returns: a data frame of the real component of polarisation B with statistics for each channel.
Return type: pd.DataFrame

property pol_b_real_histogram: pandas.DataFrame

Get the histogram of the real valued, polarisation B, input data integer states.

The number of bins in the histogram is 2^(number of bits). For 8 bit data this is 256 bins and for 16 bit data this is 65536 bins.

The data frame has the following columns:

Bin - the bin for the histogram count.

Count - the number/count for the bin

Returns: a data frame for histogram data for real valued, polarisation B, voltage data.
Return type: pd.DataFrame

property pol_b_real_histogram_rfi_excised: pandas.DataFrame

Get the histogram of the real valued, pol B, input data from all channels not flagged for RFI.

The number of bins in the histogram is 2^(number of bits). For 8 bit data this is 256 bins and for 16 bit data this is 65536 bins.

The data frame has the following columns:

Bin - the bin for the histogram count.

Count - the number/count for the bin

Returns: a data frame for histogram data for real valued, polarisation B, voltage data from all channels not flagged for RFI.
Return type: pd.DataFrame

property pol_b_real_rebinned_histogram: pandas.DataFrame

Get the rebinned histogram of the real valued, pol B.

The data frame has the following columns:

Bin - the bin for the histogram count.

Count - the number/count for the bin

Returns: a data frame for rebinned histogram data for real valued, polarisation B.
Return type: pd.DataFrame

property pol_b_real_rebinned_histogram_rfi_excised: pandas.DataFrame

Get the rebinned histogram of the real valued, pol B except those flagged with RFI.

The data frame has the following columns:

Bin - the bin for the histogram count.

Count - the number/count for the bin

Returns: a data frame for rebinned histogram data for real valued, polarisation B except those flagged with RFI.
Return type: pd.DataFrame

property pol_b_rebinned_histogram2d: nptyping.NDArray.(typing.Literal['NRebin, NRebin'], nptyping.UInt32)

Get the rebinned 2D histogram data for polarisation B.

This returns a Numpy array with data for all frequencies.

the first array dimension is the real valued data.

the second array dimension is the imaginary valued data.

Returns: the rebinned 2D histogram data for polarisation B.
Return type: np.ndarray

property pol_b_rebinned_histogram2d_rfi_excised: nptyping.NDArray.(typing.Literal['NRebin, NRebin'], nptyping.UInt32)

Get the rebinned 2D histogram data for polarisation B except frequencies flagged with RFI.

This returns a Numpy array with data for frequnecies that aren’t RFI excised.

the first array dimension is the real valued data.

the second array dimension is the imaginary valued data.

Returns: the rebinned 2D histogram data for polarisation B.
Return type: np.ndarray

property pol_b_spectral_power: pandas.DataFrame

Get the mean and max spectral power values for each channel for polarisation B.

The data frame has the following columns:

Channel - the channel number the statistics are for.

Mean - the mean of the spectral power for the current channel.

Max - the maximum of the spectral power for the current channel over the time sample of the statistics file.

Returns: the mean and max spectral power values for each channel for polarisation B.
Return type: pd.DataFrame

property pol_b_spectrogram: nptyping.NDArray.(typing.Literal['NFreqBin, NTimeBin'], nptyping.Float32)

Get the spectrogram data for polarisation B.

This returns a Numpy array that can be used with Matplotlib to plot a Spectrogram. The data in the spectogram in binned by channel and within time (see Num. Frequency Bins, Num. Temporal Bins in header for more details.)

Returns: the spectrogram data for polarisation B.
Return type: np.ndarray

property pol_b_timeseries: pandas.DataFrame

Get the timeseries data for polarisation B for all frequencies.

The timeseries is binned in time (see Num. Temporal Bins in header) and is summed over all frequencies.

The data frame has the following columns:

Temporal Bin - the time bin.

Time Offset - the offset, in seconds, for the current temporal bin.

Max - the maximum power recorded in the temporal bin.

Min - the minimum power recorded in the temporal bin.

Mean - the mean power recorded in the temporal bin.

Returns: a data frame with the timeseries statistics for polarisation B.
Return type: pd.DataFrame

property pol_b_timeseries_rfi_excised: pandas.DataFrame

Get the timeseries data for polarisation B for all frequencies except for RFI excised frequencies.

The timeseries is binned in time (see Num. Temporal Bins in header) and is summed over all frequencies.

The data frame has the following columns:

Temporal Bin - the time bin.

Time Offset - the offset, in seconds, for the current temporal bin.

Max - the maximum power recorded in the temporal bin.

Min - the minimum power recorded in the temporal bin.

Mean - the mean power recorded in the temporal bin.

Returns: a data frame with the timeseries statistics for polarisation B except for frequencies that have been RFI excised.
Return type: pd.DataFrame

property timeseries_bins: nptyping.NDArray.(typing.Literal['NTimeBin'], nptyping.Float64): Get the timeseries bins used in the spectrogram and timeseries data.

ska_pst_stat.hdf5

This module is used for handling a HDF5 STAT file.

class ska_pst_stat.hdf5.Dimension(value)[source]

An enum used to represent the complex dimension/component within the data.

property text: str

Map dimension enum value to text used in data frames.

Returns: ‘Real’ if value is REAL else ‘Imag’
Return type: str

class ska_pst_stat.hdf5.Polarisation(value)[source]

An enum used to represent polarisation indexes within the data.

property text: str

Map polarisation enum value to text used in data frames.

Returns: ‘A’ if value is POL_A else ‘B’
Return type: str

class ska_pst_stat.hdf5.StatisticsData(*, mean_frequency_avg: nptyping.NDArray.(typing.Literal['NPol, NDim'], nptyping.Float32), mean_frequency_avg_rfi_excised: nptyping.NDArray.(typing.Literal['NPol, NDim'], nptyping.Float32), variance_frequency_avg: nptyping.NDArray.(typing.Literal['NPol, NDim'], nptyping.Float32), variance_frequency_avg_rfi_excised: nptyping.NDArray.(typing.Literal['NPol, NDim'], nptyping.Float32), mean_spectrum: nptyping.NDArray.(typing.Literal['NPol, NDim, NChan'], nptyping.Float32), variance_spectrum: nptyping.NDArray.(typing.Literal['NPol, NDim, NChan'], nptyping.Float32), mean_spectral_power: nptyping.NDArray.(typing.Literal['NPol, NChan'], nptyping.Float32), max_spectral_power: nptyping.NDArray.(typing.Literal['NPol, NChan'], nptyping.Float32), histogram_1d_freq_avg: nptyping.NDArray.(typing.Literal['NPol, NDim, NBin'], nptyping.UInt32), histogram_1d_freq_avg_rfi_excised: nptyping.NDArray.(typing.Literal['NPol, NDim, NBin'], nptyping.UInt32), rebinned_histogram_2d_freq_avg: nptyping.NDArray.(typing.Literal['NPol, NRebin, NRebin'], nptyping.UInt32), rebinned_histogram_2d_freq_avg_rfi_excised: nptyping.NDArray.(typing.Literal['NPol, NRebin, NRebin'], nptyping.UInt32), rebinned_histogram_1d_freq_avg: nptyping.NDArray.(typing.Literal['NPol, NDim, NRebin'], nptyping.UInt32), rebinned_histogram_1d_freq_avg_rfi_excised: nptyping.NDArray.(typing.Literal['NPol, NDim, NRebin'], nptyping.UInt32), num_clipped_samples_spectrum: nptyping.NDArray.(typing.Literal['NPol, NDim, NChan'], nptyping.UInt32), num_clipped_samples: nptyping.NDArray.(typing.Literal['NPol, NDim'], nptyping.UInt32), num_clipped_samples_rfi_excised: nptyping.NDArray.(typing.Literal['NPol, NDim'], nptyping.UInt32), spectrogram: nptyping.NDArray.(typing.Literal['NPol, NFreqBin, NTimeBin'], nptyping.Float32), timeseries: nptyping.NDArray.(typing.Literal['NPol, NTimeBin, 3'], nptyping.Float32), timeseries_rfi_excised: nptyping.NDArray.(typing.Literal['NPol, NTimeBin, 3'], nptyping.Float32))[source]

A data class used to the calculated statistics from random data.

Variables

mean_frequency_avg (numpy.ndarray) – the mean of the data for each polarisation and dimension, averaged over all channels.
mean_frequency_avg_rfi_excised (numpy.ndarray) – the mean of the data for each polarisation and dimension, averaged over all channels, expect those flagged for RFI.
variance_frequency_avg (numpy.ndarray) – the variance of the data for each polarisation and dimension, averaged over all channels.
variance_frequency_avg_rfi_excised (numpy.ndarray) – the variance of the data for each polarisation and dimension, averaged over all channels, expect those flagged for RFI.
mean_spectrum (numpy.ndarray) – the mean of the data for each polarisation, dimension and channel.
variance_spectrum (numpy.ndarray) – the variance of the data for each polarisation, dimension and channel.
mean_spectral_power (numpy.ndarray) – mean power spectra of the data for each polarisation and channel.
max_spectral_power (numpy.ndarray) – maximum power spectra of the data for each polarisation and channel.
histogram_1d_freq_avg (numpy.ndarray) – histogram of the input data integer states for each polarisation and dimension, averaged over all channels.
histogram_1d_freq_avg_rfi_excised (numpy.ndarray) – histogram of the input data integer states for each polarisation and dimension, averaged over all channels, expect those flagged for RFI.
rebinned_histogram_2d_freq_avg (numpy.ndarray) – Rebinned 2D histogram of the input data integer states for each polarisation, averaged over all channels.
rebinned_histogram_2d_freq_avg_rfi_excised (numpy.ndarray) – Rebinned 2D histogram of the input data integer states for each polarisation, averaged over all channels, expect those flagged for RFI.
rebinned_histogram_1d_freq_avg (numpy.ndarray) – rebinned histogram of the input data integer states for each polarisation and dimension, averaged over all channels.
rebinned_histogram_1d_freq_avg_rfi_excised (numpy.ndarray) – rebinned histogram of the input data integer states for each polarisation and dimension, averaged over all channels, expect those flagged for RFI.
num_clipped_samples_spectrum (numpy.ndarray) – number of clipped input samples (maximum level) for each polarisation, dimension and channel.
num_clipped_samples (numpy.ndarray) – number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.
num_clipped_samples_rfi_excised (numpy.ndarray) – number of clipped input samples (maximum level) for each polarisation, dimension, avereaged over all channels, except those flagged for RFI.
spectrogram (numpy.ndarray) – spectrogram of the data for each polarisation, averaged a configurable number of temporal and spectral bins (default ~1000).
timeseries (numpy.ndarray) – time series of the data for each polarisation, rebinned in time to ntime_bins, averaged over all frequency channels.
timeseries_rfi_excised (numpy.ndarray) – time series of the data for each polarisation, re-binned in time to ntime_bins, averaged over all frequency channels, expect those flagged by RFI.

class ska_pst_stat.hdf5.StatisticsMetadata(*, file_format_version: str = '1.0.0', eb_id: str, telescope: str, scan_id: int, beam_id: str, utc_start: str, t_min: float, t_max: float, frequency_mhz: float, bandwidth_mhz: float, start_chan: int, npol: int, ndim: int, nchan: int, nchan_ds: int, ndat_ds: int, histogram_nbin: int, nrebin: int, channel_freq_mhz: nptyping.NDArray.(typing.Literal['NChan'], nptyping.Float64), timeseries_bins: nptyping.NDArray.(typing.Literal['NTimeBin'], nptyping.Float64), frequency_bins: nptyping.NDArray.(typing.Literal['NFreqBin'], nptyping.Float64), num_samples: int, num_samples_rfi_excised: int, num_samples_spectrum: nptyping.NDArray.(typing.Literal['NChan'], nptyping.UInt32), num_invalid_packets: int)[source]

Data class modeling the metadata from a HDF5 STAT data file.

Variables

file_format_version (str) – the format of the HDF5 STAT file. Default is “1.0.0”
eb_id (str) – the execution block id the file relates to.
telescope (str) – the telescope the data were collected for. Should be SKALow or SKAMid
scan_id (int) – the scan id for the generated data file
beam_id (str) – the beam id for the generated data file
utc_start (str) – the UTC ISO formated start time in of scan to the nearest second.
t_min (float) – the time offset, in seconds, from the UTC start time to represent the time at the start of the data in the file.
t_max – the time offset, in seconds, from the UTC start time to represent the time at the end of data in the file.
frequency_mhz (float) – the centre frequency for the data as a whole
bandwidth_mhz (float) – the bandwidth of data
start_chan (int) – the starting channel number.
npol (int) – number of polarisations.
ndim (int) – number of dimensions in the data (should be 2 for complex data).
nchan (int) – number of channels in the data.
nchan_ds (int) – the number of frequency bins in the spectrogram data.
ndat_ds (int) – the number of temporal bins in the spectrogram and timeseries data.
histogram_nbin (int) – the number of bins in the histogram data.
nrebin (int) – number of bins to use for rebinned histograms
channel_freq_mhz (numpy.ndarray) – the centre frequencies of each channel (MHz).
timeseries_bins (numpy.ndarray) – the timestamp offsets for each temporal bin.
frequency_bins (numpy.ndarray) – the frequency bins used for the spectrogram attribute (MHz).
num_samples (int) – the total number of samples used to calculate the sample statistics.
num_samples_rfi_excised (int) – the total number of samples used to calculate the sample statistics, expect those flagged for RFI.
num_samples_spectrum (numpy.ndarray) – the number of samples, per channel, to calculate the sample statistics.
num_invalid_packets (int) – the number invalid packets received while calculating the statisitcs.

property end_chan: int: Get the last channel that the header is for.

class ska_pst_stat.hdf5.TimeseriesDimension(value)[source]: An enum used to represent which index to use for max/min/mean in timeseries data.

ska_pst_stat.hdf5.map_hdf5_key(hdf5_key: str) → str[source]: Map a key from a HDF5 attribute/dataset to a model dataclass property.

ska_pst_stat.utility

This module for utility class to generate Gaussian random data.

class ska_pst_stat.utility.Hdf5FileGenerator(file_path: pathlib.Path | str, eb_id: str, telescope: str, scan_id: int, beam_id: str, config: ska_pst_stat.utility.hdf5_file_generator.StatConfig, utc_start: Optional[str] = None)[source]

Class used to generate a random HD5F statistics file.

generate() → None[source]: Generate a HDF5 file to use in a test.

property stats: ska_pst_stat.stats.Statistics

Get generated statistics.

This will throw an AssertionError if generate() has not been called.

class ska_pst_stat.utility.StatConfig(*, npol: int = 2, ndim: int = 2, nchan: int = 432, nsamp: int = 32, nheap: int = 1, nbit: int = 16, nfreq_bins: int = 36, ntime_bins: int = 4, nrebin: int = 256, sigma: float = 6.0, freq_mask: str = '', frequency_mhz: float = 87.5, bandwidth_mhz: float = 75.0, start_chan: int = 0, tsamp: float = 207.36, os_factor: float = 1.3333333333333333)[source]

A data class used as configuration for generating random data.

Variables

npol (int) – number of polarisations, default 2.
ndim (int) – number of dimensions in the data, default 2.
nchan (int) – number of channels in the data, default 432.
nsamp (int) – number of samples of each channel per heap, default 32.
nheap (int) – number of heaps of data to produce, default is 1.
nbit (int) – the number of bits per data, this can only be 8 or 16.
nfreq_bins (int) – requested number of frequency bins for spectrogram. This gets updated to be a factor of the number of channels.
ntime_bins (int) – requested number of temporal bins for spectrogram and timeseries. This gets updated to be a factor of the total number of samples per channel.
nrebin (int) – number of bins to use for rebinned histograms
sigma (float) – number standard deviations to use to clip data. This is only used in the generator.
freq_mask (str) – the frequency ranges to mask. (Currently not used)
frequency_mhz (float) – the centre frequency for the data as a whole
bandwidth_mhz (float) – the bandwidth of data
start_chan (int) – the starting channel number.
tsamp (float) – the time, in microseconds, per sample
os_factor (float) – the oversampling factor

property clipped_high: int: Get the maximum value for the current nbit.

property clipped_low: int: Get the minimum value for the current nbit.

property nbin: int: Get the number of bins for histogram.

property nbit_limit: int: Get the limit for current nbit.

property non_rfi_channel_indexes: List[int]: Get the index of channels that are not RFI excised.

property rebin_max: int: Get the maximum value after rebinning.

property rebin_offset: int: Get the offset to apply when doing rebinning.

property rfi_excised_channel_indexes: List[int]: Get the indexes of the RFI excised channels.

property scale: float: Get scale of the Gaussian distribution.

property total_sample_time: float: Get the total sample time in seconds.

property total_samples_per_channel: int: Get the total number of samples per channel.

property tsamp_secs: float: Get the TSAMP value in seconds.