Python Data Access Library

The data files produced by SKA PST STAT are HDF5 files but as structured in a way that is not easily used by a person that is unfamilar with the file format. The SKA PST STAT project provides a Python library ska_pst_stat that can be used in a Jupyter notebook and exposes the data with easy to use properties.

ska_pst_stat

This module for any Python related code for STAT.

The core class in this module is the Statistics class which can be used to load a SKA PST STAT HDF5 file and provide ease of use access to the underlying property.

This package depends on Pandas, Numpy and H5Py and the dependencies should get installed automatically when installing this package.

The following code snippet demonstrates how to load a file and get the header metadata of the file and plot a spectrogram.

from ska_pst_stat import Statistics
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

file_path = "/tmp/some-path-to-file/stat.h5"
stats = Statistics.load_from_file(file_path)

# this returns a Pandas DataFrame
header = stats.header

# to plot the spectrogram for polarisation A
plt.imshow(stats.pol_a_spectrogram)
plt.show()
class ska_pst_stat.Statistics(*, metadata: ska_pst_stat.hdf5.model.StatisticsMetadata, data: ska_pst_stat.hdf5.model.StatisticsData)[source]

Data class used to abstract over HDF5 file.

Instances of this should be created by passing the location of a STAT file to the load_from_file() method.

property channel_numbers: nptyping.NDArray.(typing.Literal['NChan'], nptyping.Int)

Get an array of channel numbers.

property frequency_averaged_stats: pandas.DataFrame

Get the frequency averaged statistics for all frequencies.

This returns the mean and variance of all the data across all frequencies, including frequencies marked as having RFI, separated for each polarisation and complex value dimension. The statistics also includes the number of samples clipped (i.e. the digital value was at the min or max value given the number of bits.)

The data frame has the following columns:

  • Polarisation - which polarisation that the statistic value is for.

  • Dimension - which complex dimension/component (i.e. real or imag) that the statistic is for.

  • Mean - the mean of the data for each polarisation and dimension, averaged over all channels.

  • Variance - the variance of the data for each polarisation and dimension, averaged over all channels.

  • Clipped - number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

The Pandas frame has a MultiIndex key using the Polarisation, and Dimension columns.

property frequency_averaged_stats_rfi_excised: pandas.DataFrame

Get the frequency averaged statistics from all channels not flagged for RFI.

This returns the mean and variance of all the data across all channels, expect those flagged for RFI, separated for each polarisation and complex value dimension. The statistics also includes the number of samples clipped (i.e. the digital value was at the min or max value given the number of bits.)

The data frame has the following columns:

  • Polarisation - which polarisation that the statistic value is for.

  • Dimension - which complex dimension/component (i.e. real or imag) that the statistic is for.

  • Mean - the mean of the data for each polarisation and dimension, averaged over all channels.

  • Variance - the variance of the data for each polarisation and dimension, averaged over all channels.

  • Clipped - number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

The Pandas frame has a MultiIndex key using the Polarisation, and Dimension columns.

property frequency_bins: nptyping.NDArray.(typing.Literal['NFreqBin'], nptyping.Float64)

Get the frequency bins used in the spectrogram data.

get_channel_stats() pandas.DataFrame[source]

Get the channel statistics.

While this method is a public method, it is recommened to use the following properties as they provide more specific access to the data based on polarisation and specific dimension of the complex voltage data.

The data frame has the following columns:

  • Channel - the channel number the statistics are for.

  • Polarisation - which polarisation that the statistic value is for.

  • Dimension - which complex dimension/component (i.e. real or imag) that the statistic is for.

  • Channel Freq. (MHz) - the centre frequency for the channel.

  • Mean - the mean of the data for each polarisation and dimension, averaged over all channels.

  • Variance - the variance of the data for each polarisation and dimension, averaged over all channels.

  • Clipped - number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

The Pandas frame has a MultiIndex key using the Channel, Polarisation, and Dimension columns.

Returns

a data frame with statistics for each channel split by polarisation and complex voltage dimension.

Return type

pd.DataFrame

get_frequency_averaged_stats() pandas.DataFrame[source]

Get the frequency averaged statistics.

This will return a data frame that includes statistics across all frequencies/channels as well as only the frequencies/channels that weren’t marked as having RFI.

While this method is a public method, it is recommened to use the following properties directly:

The data frame has the following columns:

  • Polarisation - which polarisation that the statistic value is for.

  • Dimension - which complex dimension/component (i.e. real or imag) that the statistic is for.

  • RFI Excised - a boolean value of whether the statistic after RFI had been excised.

  • Mean - the mean of the data for each polarisation and dimension, averaged over all channels.

  • Variance - the variance of the data for each polarisation and dimension, averaged over all channels.

  • Clipped - number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

The Pandas frame has a MultiIndex key using the Polarisation, Dimension, and RFI Excised columns.

get_histogram_data(rfi_excised: bool) pandas.DataFrame[source]

Get the histogram of the input data integer states for each polarisation and dimension.

While this method is a public method, it is recommened to use one of the following 8 properties as they provide the data in a more usable format:

The number of bins in the histogram is 2^(number of bits). For 8 bit data this is 256 bins and for 16 bit data this is 65536 bins.

The data frame has the following columns:

  • Bin - the bin for the histogram count.

  • Polarisation - which polarisation that the statistic value is for.

  • Dimension - which complex dimension/component (i.e. real or imag) that the statistic is for.

  • Count - the number/count for the bin.

The Pandas frame has a MultiIndex key using the Bin, Polarisation, and Dimension columns.

Parameters

rfi_excised (True) – a bool value to report on all (False) or RFI excised (True) data

Returns

a data frame for histogram data split polarisation and complex voltage dimension.

Return type

pd.DataFrame

get_rebinned_histogram2d_data(rfi_excised: bool, polarisation: ska_pst_stat.hdf5.consts.Polarisation) -> nptyping.NDArray.(typing.Literal['NRebin, NRebin'], nptyping.UInt32)[source]

Get the 2D histogram data.

This returns a Numpy array rather than a Pandas Dataframe.

While this is a public method the following properties should be used as they provide a more user friendly API.

Parameters
  • rfi_excised (bool) – use the RFI excised data (True) or all data (False)

  • polarisaion – which polarisation of the data to use.

get_rebinned_histogram_data(rfi_excised: bool) pandas.DataFrame[source]

Get rebinned histogram data.

While this method is a public method, it is recommened to use one of the following 8 properties as they provide the data in a more usable format.

The number of bins that the data has been rebinned to is Num. Histogram Bins (Rebinned) value found in the header.

The data frame has the following columns:

  • Bin - the bin for the histogram count.

  • Polarisation - which polarisation that the statistic value is for.

  • Dimension - which complex dimension/component (i.e. real or imag) that the statistic is for.

  • Count - the number/count for the bin

The Pandas frame has a MultiIndex key using the Bin, Polarisation, and Dimension columns.

Parameters

rfi_excised (True) – a bool value to report on all (False) or RFI excised (True) data

Returns

a data frame for the rebinned histogram data split polarisation and complex voltage dimension.

Return type

pd.DataFrame

get_spectral_power() pandas.DataFrame[source]

Get the mean and max spectral power values for each channel.

This data frame includes both the mean and max of the spectral power for each channel for all polarisations.

The following properties are provided for each polarisation:

The data frame has the following columns:

  • Polarisation - which polarisation that the statistic value is for.

  • Channel - the channel number the statistics are for.

  • Mean - the mean of the spectral power for the current channel.

  • Max - the maximum of the spectral power for the current channel over the time sample of the statistics file.

The Pandas frame has a MultiIndex key using the Polarisation, and Channel columns.

Returns

the mean and max spectral power values for each channel.

Return type

pd.DataFrame

get_timeseries_data(rfi_excised: bool) pandas.DataFrame[source]

Get the timeseries data.

While this is a public method, the following properties should be used as they provide a more user friendly access to the data.

The timeseries is binned in time (see Num. Temporal Bins in header) and is summed over all frequencies. If rfi_excised is True then the summing happens over the frequency that are not RFI excised.

The data frame has the following columns:

  • Polarisation - which polarisation that the statistic value is for.

  • Temporal Bin - the time bin.

  • Time Offset - the offset, in seconds, for the current temporal bin.

  • Max - the maximum power recorded in the temporal bin.

  • Min - the minimum power recorded in the temporal bin.

  • Mean - the mean power recorded in the temporal bin.

The Pandas frame has a MultiIndex key using the Polarisation, and Temporal Bin columns.

Parameters

rfi_excised (bool) – whether to use all frequencies (False) or those that are not marked as having RFI.

Returns

a data frame with the timeseries statistics.

Return type

pd.DataFrame

property header: pandas.DataFrame

Get the header metadata for the data file.

This returns a Pandas data frame of the header data from the HDF5 file. This the user of the API to see what is in the HEADER dataset without the need of using a HDF5 view tool

The header has the following fields:

Key

Example

Description

File Format Version

1.0.0

the version of the SKA PST STAT file format that the file is from.

Execution Block ID

eb-m001-20230921-245

the execution block ID of the generated data file

Telescope

SKALow

the telescope used for the generated data file (i.e. SKALow or SKAMid)

Scan ID

42

the ID of the scan that the file was generated from.

Beam ID

1

the PST BEAM ID that was used for the scan

UTC Start Time

2023-10-23-11:00:00

an ISO formated string of the UTC time at the start of the scan

Start Scan Offset

0.0

the time offset, in seconds, from the UTC start time to represent the time at the start of the data in the file.

End Scan Offset

0.106168

the time offset, in seconds, from the UTC start time to represent the time at the end of data in the file.

Frequency (MHz)

87.5

the centre frequency for the data as a whole

Bandwidth (MHz)

75.0

the bandwidth of data

Start Channel Number

0

the starting channel number

End Channel Number

431

the last channel that the data is for

Num. Polarisations

2

number of polarisations, this should be 2

Num. Dimensions

2

number of dimensions in the data (should be 2 for complex data)

Num. Channels

432

number of channels in the data

Num. Frequency Bins

36

he number of frequency bins in the spectrogram data

Num. Temporal Bins

32

the number of temporal bins in the spectrogram and timeseries data

Num. Histogram Bins

65536

the number of bins in the histogram data

Num. Histogram Bins (Rebinned)

256

number of bins to used in the rebinned histograms

Num. Samples

21012480

total number of samples used to calculate statistics

Num. Samples (RFI Excised)

19456000

total number of samples used to calculate statistics, excluding RFI excised data

Num. Invalid Packets

0

total number invalid/dropped packets in the data used to calculate statistics.

Returns

a human readable version of the header scalar fields.

Return type

pd.DataFrame

static load_from_file(file_path: pathlib.Path | str) ska_pst_stat.stats.Statistics[source]

Load a HDF5 STAT file and return an instance of the Statistics class.

Parameters

file_path (pathlib.Path | str) – the path to the file to load the statistics from

Returns

the statistics from the HDF5 file as a Python class

Return type

Statistics

property nchan: int

Get the number of channels for the voltage data.

property ndim: int

Get the number of dimensions of voltage data.

This value should be 2 as SKAO uses complex voltage data and the statistics has real and imaginary dimensions.

property npol: int

Get the number of polarisations.

property pol_a_channel_stats: pandas.DataFrame

Get the polarisation A channel statistics.

This property includes both the real and complex dimension of the data. The following utility properties are provided to get the statistics of each dimension directly:

The data frame has the following columns:

  • Channel - the channel number the statistics are for.

  • Dimension - which complex dimension/component (i.e. real or imag) that the statistic is for.

  • Channel Freq. (MHz) - the centre frequency for the channel.

  • Mean - the mean of the data for each polarisation and dimension, averaged over all channels.

  • Variance - the variance of the data for each polarisation and dimension, averaged over all channels.

  • Clipped - number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

The Pandas frame has a MultiIndex key using the Channel, and Dimension columns.

Returns

a data frame of polarisation A with statistics for each channel split complex voltage dimension.

Return type

pd.DataFrame

property pol_a_imag_channel_stats: pandas.DataFrame

Get the imaginary valued, polarisation A channel statistics.

The data frame has the following columns:

  • Channel - the channel number the statistics are for.

  • Channel Freq. (MHz) - the centre frequency for the channel.

  • Mean - the mean of the data for each polarisation and dimension, averaged over all channels.

  • Variance - the variance of the data for each polarisation and dimension, averaged over all channels.

  • Clipped - number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

Returns

a data frame of the imaginary component of polarisation A with statistics for each channel.

Return type

pd.DataFrame

property pol_a_imag_histogram: pandas.DataFrame

Get the histogram of the imaginary valued, polarisation A, input data integer states.

The number of bins in the histogram is 2^(number of bits). For 8 bit data this is 256 bins and for 16 bit data this is 65536 bins.

The data frame has the following columns:

  • Bin - the bin for the histogram count.

  • Count - the number/count for the bin

Returns

a data frame for histogram data for imaginary valued, polarisation A, voltage data.

Return type

pd.DataFrame

property pol_a_imag_histogram_rfi_excised: pandas.DataFrame

Get the histogram of the imag valued, pol A, input data from all channels not flagged for RFI.

The number of bins in the histogram is 2^(number of bits). For 8 bit data this is 256 bins and for 16 bit data this is 65536 bins.

The data frame has the following columns:

  • Bin - the bin for the histogram count.

  • Count - the number/count for the bin

Returns

a data frame for histogram data for imaginary valued, polarisation A, voltage data from all channels not flagged for RFI.

Return type

pd.DataFrame

property pol_a_imag_rebinned_histogram: pandas.DataFrame

Get the rebinned histogram of the imaginary valued, pol A.

The data frame has the following columns:

  • Bin - the bin for the histogram count.

  • Count - the number/count for the bin

Returns

a data frame for rebinned histogram data for imaginary valued, polarisation A.

Return type

pd.DataFrame

property pol_a_imag_rebinned_histogram_rfi_excised: pandas.DataFrame

Get the rebinned histogram of the imag valued, pol A except those flagged with RFI.

The data frame has the following columns:

  • Bin - the bin for the histogram count.

  • Count - the number/count for the bin

Returns

a data frame for rebinned histogram data for imaginary valued, polarisation A except those flagged with RFI.

Return type

pd.DataFrame

property pol_a_real_channel_stats: pandas.DataFrame

Get the real valued, polarisation A channel statistics.

The data frame has the following columns:

  • Channel - the channel number the statistics are for.

  • Channel Freq. (MHz) - the centre frequency for the channel.

  • Mean - the mean of the data for each polarisation and dimension, averaged over all channels.

  • Variance - the variance of the data for each polarisation and dimension, averaged over all channels.

  • Clipped - number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

Returns

a data frame of the real component of polarisation A with statistics for each channel.

Return type

pd.DataFrame

property pol_a_real_histogram: pandas.DataFrame

Get the histogram of the real valued, polarisation A, input data integer states.

The number of bins in the histogram is 2^(number of bits). For 8 bit data this is 256 bins and for 16 bit data this is 65536 bins.

The data frame has the following columns:

  • Bin - the bin for the histogram count.

  • Count - the number/count for the bin

Returns

a data frame for histogram data for real valued, polarisation A, voltage data.

Return type

pd.DataFrame

property pol_a_real_histogram_rfi_excised: pandas.DataFrame

Get the histogram of the real valued, pol A, input data from all channels not flagged for RFI.

The number of bins in the histogram is 2^(number of bits). For 8 bit data this is 256 bins and for 16 bit data this is 65536 bins.

The data frame has the following columns:

  • Bin - the bin for the histogram count.

  • Count - the number/count for the bin

Returns

a data frame for histogram data for real valued, polarisation A, voltage data from all channels not flagged for RFI.

Return type

pd.DataFrame

property pol_a_real_rebinned_histogram: pandas.DataFrame

Get the rebinned histogram of the real valued, pol A.

The data frame has the following columns:

  • Bin - the bin for the histogram count.

  • Count - the number/count for the bin

Returns

a data frame for rebinned histogram data for real valued, polarisation A.

Return type

pd.DataFrame

property pol_a_real_rebinned_histogram_rfi_excised: pandas.DataFrame

Get the rebinned histogram of the real valued, pol A except those flagged with RFI.

The data frame has the following columns:

  • Bin - the bin for the histogram count.

  • Count - the number/count for the bin

Returns

a data frame for rebinned histogram data for real valued, polarisation A except those flagged with RFI.

Return type

pd.DataFrame

property pol_a_rebinned_histogram2d: nptyping.NDArray.(typing.Literal['NRebin, NRebin'], nptyping.UInt32)

Get the rebinned 2D histogram data for polarisation A.

This returns a Numpy array with data for all frequencies.

  • the first array dimension is the real valued data.

  • the second array dimension is the imaginary valued data.

Returns

the rebinned 2D histogram data for polarisation A.

Return type

np.ndarray

property pol_a_rebinned_histogram2d_rfi_excised: nptyping.NDArray.(typing.Literal['NRebin, NRebin'], nptyping.UInt32)

Get the rebinned 2D histogram data for polarisation A except frequencies flagged with RFI.

This returns a Numpy array with data for frequnecies that aren’t RFI excised.

  • the first array dimension is the real valued data.

  • the second array dimension is the imaginary valued data.

Returns

the rebinned 2D histogram data for polarisation A.

Return type

np.ndarray

property pol_a_spectral_power: pandas.DataFrame

Get the mean and max spectral power values for each channel for polarisation A.

The data frame has the following columns:

  • Channel - the channel number the statistics are for.

  • Mean - the mean of the spectral power for the current channel.

  • Max - the maximum of the spectral power for the current channel over the time sample of the statistics file.

Returns

the mean and max spectral power values for each channel for polarisation A.

Return type

pd.DataFrame

property pol_a_spectrogram: nptyping.NDArray.(typing.Literal['NFreqBin, NTimeBin'], nptyping.Float32)

Get the spectrogram data for polarisation A.

This returns a Numpy array that can be used with Matplotlib to plot a Spectrogram. The data in the spectogram in binned by channel and within time (see Num. Frequency Bins, Num. Temporal Bins in header for more details.)

Returns

the spectrogram data for polarisation A.

Return type

np.ndarray

property pol_a_timeseries: pandas.DataFrame

Get the timeseries data for polarisation A for all frequencies.

The timeseries is binned in time (see Num. Temporal Bins in header) and is summed over all frequencies.

The data frame has the following columns:

  • Temporal Bin - the time bin.

  • Time Offset - the offset, in seconds, for the current temporal bin.

  • Max - the maximum power recorded in the temporal bin.

  • Min - the minimum power recorded in the temporal bin.

  • Mean - the mean power recorded in the temporal bin.

Returns

a data frame with the timeseries statistics for polarisation A.

Return type

pd.DataFrame

property pol_a_timeseries_rfi_excised: pandas.DataFrame

Get the timeseries data for polarisation A for all frequencies except for RFI excised frequencies.

The timeseries is binned in time (see Num. Temporal Bins in header) and is summed over all frequencies.

The data frame has the following columns:

  • Temporal Bin - the time bin.

  • Time Offset - the offset, in seconds, for the current temporal bin.

  • Max - the maximum power recorded in the temporal bin.

  • Min - the minimum power recorded in the temporal bin.

  • Mean - the mean power recorded in the temporal bin.

Returns

a data frame with the timeseries statistics for polarisation B except for frequencies that have been RFI excised.

Return type

pd.DataFrame

property pol_b_channel_stats: pandas.DataFrame

Get the polarisation B channel statistics.

This property includes both the real and complex dimension of the data. The following utility properties are provided to get the statistics of each dimension directly:

The data frame has the following columns:

  • Channel - the channel number the statistics are for.

  • Channel Freq. (MHz) - the centre frequency for the channel.

  • Mean - the mean of the data for each polarisation and dimension, averaged over all channels.

  • Variance - the variance of the data for each polarisation and dimension, averaged over all channels.

  • Clipped - number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

The Pandas frame has a MultiIndex key using the Channel, and Dimension columns.

Returns

a data frame of polarisation B with statistics for each channel split complex voltage dimension.

Return type

pd.DataFrame

property pol_b_imag_channel_stats: pandas.DataFrame

Get the imaginary valued, polarisation B channel statistics.

The data frame has the following columns:

  • Channel - the channel number the statistics are for.

  • Channel Freq. (MHz) - the centre frequency for the channel.

  • Mean - the mean of the data for each polarisation and dimension, averaged over all channels.

  • Variance - the variance of the data for each polarisation and dimension, averaged over all channels.

  • Clipped - number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

Returns

a data frame of the imaginary component of polarisation B with statistics for each channel.

Return type

pd.DataFrame

property pol_b_imag_histogram: pandas.DataFrame

Get the histogram of the imaginary valued, polarisation B, input data integer states.

The number of bins in the histogram is 2^(number of bits). For 8 bit data this is 256 bins and for 16 bit data this is 65536 bins.

The data frame has the following columns:

  • Bin - the bin for the histogram count.

  • Count - the number/count for the bin

Returns

a data frame for histogram data for imaginary valued, polarisation B, voltage data.

Return type

pd.DataFrame

property pol_b_imag_histogram_rfi_excised: pandas.DataFrame

Get the histogram of the imag valued, pol B, input data from all channels not flagged for RFI.

The number of bins in the histogram is 2^(number of bits). For 8 bit data this is 256 bins and for 16 bit data this is 65536 bins.

The data frame has the following columns:

  • Bin - the bin for the histogram count.

  • Count - the number/count for the bin

Returns

a data frame for histogram data for imaginary valued, polarisation B, voltage data from all channels not flagged for RFI.

Return type

pd.DataFrame

property pol_b_imag_rebinned_histogram: pandas.DataFrame

Get the rebinned histogram of the imaginary valued, pol B.

The data frame has the following columns:

  • Bin - the bin for the histogram count.

  • Count - the number/count for the bin

Returns

a data frame for rebinned histogram data for imaginary valued, polarisation B.

Return type

pd.DataFrame

property pol_b_imag_rebinned_histogram_rfi_excised: pandas.DataFrame

Get the rebinned histogram of the imag valued, pol B except those flagged with RFI.

The data frame has the following columns:

  • Bin - the bin for the histogram count.

  • Count - the number/count for the bin

Returns

a data frame for rebinned histogram data for imaginary valued, polarisation B except those flagged with RFI.

Return type

pd.DataFrame

property pol_b_real_channel_stats: pandas.DataFrame

Get the real valued, polarisation B channel statistics.

The data frame has the following columns:

  • Channel - the channel number the statistics are for.

  • Channel Freq. (MHz) - the centre frequency for the channel.

  • Mean - the mean of the data for each polarisation and dimension, averaged over all channels.

  • Variance - the variance of the data for each polarisation and dimension, averaged over all channels.

  • Clipped - number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

Returns

a data frame of the real component of polarisation B with statistics for each channel.

Return type

pd.DataFrame

property pol_b_real_histogram: pandas.DataFrame

Get the histogram of the real valued, polarisation B, input data integer states.

The number of bins in the histogram is 2^(number of bits). For 8 bit data this is 256 bins and for 16 bit data this is 65536 bins.

The data frame has the following columns:

  • Bin - the bin for the histogram count.

  • Count - the number/count for the bin

Returns

a data frame for histogram data for real valued, polarisation B, voltage data.

Return type

pd.DataFrame

property pol_b_real_histogram_rfi_excised: pandas.DataFrame

Get the histogram of the real valued, pol B, input data from all channels not flagged for RFI.

The number of bins in the histogram is 2^(number of bits). For 8 bit data this is 256 bins and for 16 bit data this is 65536 bins.

The data frame has the following columns:

  • Bin - the bin for the histogram count.

  • Count - the number/count for the bin

Returns

a data frame for histogram data for real valued, polarisation B, voltage data from all channels not flagged for RFI.

Return type

pd.DataFrame

property pol_b_real_rebinned_histogram: pandas.DataFrame

Get the rebinned histogram of the real valued, pol B.

The data frame has the following columns:

  • Bin - the bin for the histogram count.

  • Count - the number/count for the bin

Returns

a data frame for rebinned histogram data for real valued, polarisation B.

Return type

pd.DataFrame

property pol_b_real_rebinned_histogram_rfi_excised: pandas.DataFrame

Get the rebinned histogram of the real valued, pol B except those flagged with RFI.

The data frame has the following columns:

  • Bin - the bin for the histogram count.

  • Count - the number/count for the bin

Returns

a data frame for rebinned histogram data for real valued, polarisation B except those flagged with RFI.

Return type

pd.DataFrame

property pol_b_rebinned_histogram2d: nptyping.NDArray.(typing.Literal['NRebin, NRebin'], nptyping.UInt32)

Get the rebinned 2D histogram data for polarisation B.

This returns a Numpy array with data for all frequencies.

  • the first array dimension is the real valued data.

  • the second array dimension is the imaginary valued data.

Returns

the rebinned 2D histogram data for polarisation B.

Return type

np.ndarray

property pol_b_rebinned_histogram2d_rfi_excised: nptyping.NDArray.(typing.Literal['NRebin, NRebin'], nptyping.UInt32)

Get the rebinned 2D histogram data for polarisation B except frequencies flagged with RFI.

This returns a Numpy array with data for frequnecies that aren’t RFI excised.

  • the first array dimension is the real valued data.

  • the second array dimension is the imaginary valued data.

Returns

the rebinned 2D histogram data for polarisation B.

Return type

np.ndarray

property pol_b_spectral_power: pandas.DataFrame

Get the mean and max spectral power values for each channel for polarisation B.

The data frame has the following columns:

  • Channel - the channel number the statistics are for.

  • Mean - the mean of the spectral power for the current channel.

  • Max - the maximum of the spectral power for the current channel over the time sample of the statistics file.

Returns

the mean and max spectral power values for each channel for polarisation B.

Return type

pd.DataFrame

property pol_b_spectrogram: nptyping.NDArray.(typing.Literal['NFreqBin, NTimeBin'], nptyping.Float32)

Get the spectrogram data for polarisation B.

This returns a Numpy array that can be used with Matplotlib to plot a Spectrogram. The data in the spectogram in binned by channel and within time (see Num. Frequency Bins, Num. Temporal Bins in header for more details.)

Returns

the spectrogram data for polarisation B.

Return type

np.ndarray

property pol_b_timeseries: pandas.DataFrame

Get the timeseries data for polarisation B for all frequencies.

The timeseries is binned in time (see Num. Temporal Bins in header) and is summed over all frequencies.

The data frame has the following columns:

  • Temporal Bin - the time bin.

  • Time Offset - the offset, in seconds, for the current temporal bin.

  • Max - the maximum power recorded in the temporal bin.

  • Min - the minimum power recorded in the temporal bin.

  • Mean - the mean power recorded in the temporal bin.

Returns

a data frame with the timeseries statistics for polarisation B.

Return type

pd.DataFrame

property pol_b_timeseries_rfi_excised: pandas.DataFrame

Get the timeseries data for polarisation B for all frequencies except for RFI excised frequencies.

The timeseries is binned in time (see Num. Temporal Bins in header) and is summed over all frequencies.

The data frame has the following columns:

  • Temporal Bin - the time bin.

  • Time Offset - the offset, in seconds, for the current temporal bin.

  • Max - the maximum power recorded in the temporal bin.

  • Min - the minimum power recorded in the temporal bin.

  • Mean - the mean power recorded in the temporal bin.

Returns

a data frame with the timeseries statistics for polarisation B except for frequencies that have been RFI excised.

Return type

pd.DataFrame

property timeseries_bins: nptyping.NDArray.(typing.Literal['NTimeBin'], nptyping.Float64)

Get the timeseries bins used in the spectrogram and timeseries data.

ska_pst_stat.hdf5

This module is used for handling a HDF5 STAT file.

class ska_pst_stat.hdf5.Dimension(value)[source]

An enum used to represent the complex dimension/component within the data.

property text: str

Map dimension enum value to text used in data frames.

Returns

‘Real’ if value is REAL else ‘Imag’

Return type

str

class ska_pst_stat.hdf5.Polarisation(value)[source]

An enum used to represent polarisation indexes within the data.

property text: str

Map polarisation enum value to text used in data frames.

Returns

‘A’ if value is POL_A else ‘B’

Return type

str

class ska_pst_stat.hdf5.StatisticsData(*, mean_frequency_avg: nptyping.NDArray.(typing.Literal['NPol, NDim'], nptyping.Float32), mean_frequency_avg_rfi_excised: nptyping.NDArray.(typing.Literal['NPol, NDim'], nptyping.Float32), variance_frequency_avg: nptyping.NDArray.(typing.Literal['NPol, NDim'], nptyping.Float32), variance_frequency_avg_rfi_excised: nptyping.NDArray.(typing.Literal['NPol, NDim'], nptyping.Float32), mean_spectrum: nptyping.NDArray.(typing.Literal['NPol, NDim, NChan'], nptyping.Float32), variance_spectrum: nptyping.NDArray.(typing.Literal['NPol, NDim, NChan'], nptyping.Float32), mean_spectral_power: nptyping.NDArray.(typing.Literal['NPol, NChan'], nptyping.Float32), max_spectral_power: nptyping.NDArray.(typing.Literal['NPol, NChan'], nptyping.Float32), histogram_1d_freq_avg: nptyping.NDArray.(typing.Literal['NPol, NDim, NBin'], nptyping.UInt32), histogram_1d_freq_avg_rfi_excised: nptyping.NDArray.(typing.Literal['NPol, NDim, NBin'], nptyping.UInt32), rebinned_histogram_2d_freq_avg: nptyping.NDArray.(typing.Literal['NPol, NRebin, NRebin'], nptyping.UInt32), rebinned_histogram_2d_freq_avg_rfi_excised: nptyping.NDArray.(typing.Literal['NPol, NRebin, NRebin'], nptyping.UInt32), rebinned_histogram_1d_freq_avg: nptyping.NDArray.(typing.Literal['NPol, NDim, NRebin'], nptyping.UInt32), rebinned_histogram_1d_freq_avg_rfi_excised: nptyping.NDArray.(typing.Literal['NPol, NDim, NRebin'], nptyping.UInt32), num_clipped_samples_spectrum: nptyping.NDArray.(typing.Literal['NPol, NDim, NChan'], nptyping.UInt32), num_clipped_samples: nptyping.NDArray.(typing.Literal['NPol, NDim'], nptyping.UInt32), num_clipped_samples_rfi_excised: nptyping.NDArray.(typing.Literal['NPol, NDim'], nptyping.UInt32), spectrogram: nptyping.NDArray.(typing.Literal['NPol, NFreqBin, NTimeBin'], nptyping.Float32), timeseries: nptyping.NDArray.(typing.Literal['NPol, NTimeBin, 3'], nptyping.Float32), timeseries_rfi_excised: nptyping.NDArray.(typing.Literal['NPol, NTimeBin, 3'], nptyping.Float32))[source]

A data class used to the calculated statistics from random data.

Variables
  • mean_frequency_avg (numpy.ndarray) – the mean of the data for each polarisation and dimension, averaged over all channels.

  • mean_frequency_avg_rfi_excised (numpy.ndarray) – the mean of the data for each polarisation and dimension, averaged over all channels, expect those flagged for RFI.

  • variance_frequency_avg (numpy.ndarray) – the variance of the data for each polarisation and dimension, averaged over all channels.

  • variance_frequency_avg_rfi_excised (numpy.ndarray) – the variance of the data for each polarisation and dimension, averaged over all channels, expect those flagged for RFI.

  • mean_spectrum (numpy.ndarray) – the mean of the data for each polarisation, dimension and channel.

  • variance_spectrum (numpy.ndarray) – the variance of the data for each polarisation, dimension and channel.

  • mean_spectral_power (numpy.ndarray) – mean power spectra of the data for each polarisation and channel.

  • max_spectral_power (numpy.ndarray) – maximum power spectra of the data for each polarisation and channel.

  • histogram_1d_freq_avg (numpy.ndarray) – histogram of the input data integer states for each polarisation and dimension, averaged over all channels.

  • histogram_1d_freq_avg_rfi_excised (numpy.ndarray) – histogram of the input data integer states for each polarisation and dimension, averaged over all channels, expect those flagged for RFI.

  • rebinned_histogram_2d_freq_avg (numpy.ndarray) – Rebinned 2D histogram of the input data integer states for each polarisation, averaged over all channels.

  • rebinned_histogram_2d_freq_avg_rfi_excised (numpy.ndarray) – Rebinned 2D histogram of the input data integer states for each polarisation, averaged over all channels, expect those flagged for RFI.

  • rebinned_histogram_1d_freq_avg (numpy.ndarray) – rebinned histogram of the input data integer states for each polarisation and dimension, averaged over all channels.

  • rebinned_histogram_1d_freq_avg_rfi_excised (numpy.ndarray) – rebinned histogram of the input data integer states for each polarisation and dimension, averaged over all channels, expect those flagged for RFI.

  • num_clipped_samples_spectrum (numpy.ndarray) – number of clipped input samples (maximum level) for each polarisation, dimension and channel.

  • num_clipped_samples (numpy.ndarray) – number of clipped input samples (maximum level) for each polarisation, dimension, averaged over all channels.

  • num_clipped_samples_rfi_excised (numpy.ndarray) – number of clipped input samples (maximum level) for each polarisation, dimension, avereaged over all channels, except those flagged for RFI.

  • spectrogram (numpy.ndarray) – spectrogram of the data for each polarisation, averaged a configurable number of temporal and spectral bins (default ~1000).

  • timeseries (numpy.ndarray) – time series of the data for each polarisation, rebinned in time to ntime_bins, averaged over all frequency channels.

  • timeseries_rfi_excised (numpy.ndarray) – time series of the data for each polarisation, re-binned in time to ntime_bins, averaged over all frequency channels, expect those flagged by RFI.

class ska_pst_stat.hdf5.StatisticsMetadata(*, file_format_version: str = '1.0.0', eb_id: str, telescope: str, scan_id: int, beam_id: str, utc_start: str, t_min: float, t_max: float, frequency_mhz: float, bandwidth_mhz: float, start_chan: int, npol: int, ndim: int, nchan: int, nchan_ds: int, ndat_ds: int, histogram_nbin: int, nrebin: int, channel_freq_mhz: nptyping.NDArray.(typing.Literal['NChan'], nptyping.Float64), timeseries_bins: nptyping.NDArray.(typing.Literal['NTimeBin'], nptyping.Float64), frequency_bins: nptyping.NDArray.(typing.Literal['NFreqBin'], nptyping.Float64), num_samples: int, num_samples_rfi_excised: int, num_samples_spectrum: nptyping.NDArray.(typing.Literal['NChan'], nptyping.UInt32), num_invalid_packets: int)[source]

Data class modeling the metadata from a HDF5 STAT data file.

Variables
  • file_format_version (str) – the format of the HDF5 STAT file. Default is “1.0.0”

  • eb_id (str) – the execution block id the file relates to.

  • telescope (str) – the telescope the data were collected for. Should be SKALow or SKAMid

  • scan_id (int) – the scan id for the generated data file

  • beam_id (str) – the beam id for the generated data file

  • utc_start (str) – the UTC ISO formated start time in of scan to the nearest second.

  • t_min (float) – the time offset, in seconds, from the UTC start time to represent the time at the start of the data in the file.

  • t_max – the time offset, in seconds, from the UTC start time to represent the time at the end of data in the file.

  • frequency_mhz (float) – the centre frequency for the data as a whole

  • bandwidth_mhz (float) – the bandwidth of data

  • start_chan (int) – the starting channel number.

  • npol (int) – number of polarisations.

  • ndim (int) – number of dimensions in the data (should be 2 for complex data).

  • nchan (int) – number of channels in the data.

  • nchan_ds (int) – the number of frequency bins in the spectrogram data.

  • ndat_ds (int) – the number of temporal bins in the spectrogram and timeseries data.

  • histogram_nbin (int) – the number of bins in the histogram data.

  • nrebin (int) – number of bins to use for rebinned histograms

  • channel_freq_mhz (numpy.ndarray) – the centre frequencies of each channel (MHz).

  • timeseries_bins (numpy.ndarray) – the timestamp offsets for each temporal bin.

  • frequency_bins (numpy.ndarray) – the frequency bins used for the spectrogram attribute (MHz).

  • num_samples (int) – the total number of samples used to calculate the sample statistics.

  • num_samples_rfi_excised (int) – the total number of samples used to calculate the sample statistics, expect those flagged for RFI.

  • num_samples_spectrum (numpy.ndarray) – the number of samples, per channel, to calculate the sample statistics.

  • num_invalid_packets (int) – the number invalid packets received while calculating the statisitcs.

property end_chan: int

Get the last channel that the header is for.

class ska_pst_stat.hdf5.TimeseriesDimension(value)[source]

An enum used to represent which index to use for max/min/mean in timeseries data.

ska_pst_stat.hdf5.map_hdf5_key(hdf5_key: str) str[source]

Map a key from a HDF5 attribute/dataset to a model dataclass property.

ska_pst_stat.utility

This module for utility class to generate Gaussian random data.

class ska_pst_stat.utility.Hdf5FileGenerator(file_path: pathlib.Path | str, eb_id: str, telescope: str, scan_id: int, beam_id: str, config: ska_pst_stat.utility.hdf5_file_generator.StatConfig, utc_start: Optional[str] = None)[source]

Class used to generate a random HD5F statistics file.

generate() None[source]

Generate a HDF5 file to use in a test.

property stats: ska_pst_stat.stats.Statistics

Get generated statistics.

This will throw an AssertionError if generate() has not been called.

class ska_pst_stat.utility.StatConfig(*, npol: int = 2, ndim: int = 2, nchan: int = 432, nsamp: int = 32, nheap: int = 1, nbit: int = 16, nfreq_bins: int = 36, ntime_bins: int = 4, nrebin: int = 256, sigma: float = 6.0, freq_mask: str = '', frequency_mhz: float = 87.5, bandwidth_mhz: float = 75.0, start_chan: int = 0, tsamp: float = 207.36, os_factor: float = 1.3333333333333333)[source]

A data class used as configuration for generating random data.

Variables
  • npol (int) – number of polarisations, default 2.

  • ndim (int) – number of dimensions in the data, default 2.

  • nchan (int) – number of channels in the data, default 432.

  • nsamp (int) – number of samples of each channel per heap, default 32.

  • nheap (int) – number of heaps of data to produce, default is 1.

  • nbit (int) – the number of bits per data, this can only be 8 or 16.

  • nfreq_bins (int) – requested number of frequency bins for spectrogram. This gets updated to be a factor of the number of channels.

  • ntime_bins (int) – requested number of temporal bins for spectrogram and timeseries. This gets updated to be a factor of the total number of samples per channel.

  • nrebin (int) – number of bins to use for rebinned histograms

  • sigma (float) – number standard deviations to use to clip data. This is only used in the generator.

  • freq_mask (str) – the frequency ranges to mask. (Currently not used)

  • frequency_mhz (float) – the centre frequency for the data as a whole

  • bandwidth_mhz (float) – the bandwidth of data

  • start_chan (int) – the starting channel number.

  • tsamp (float) – the time, in microseconds, per sample

  • os_factor (float) – the oversampling factor

property clipped_high: int

Get the maximum value for the current nbit.

property clipped_low: int

Get the minimum value for the current nbit.

property nbin: int

Get the number of bins for histogram.

property nbit_limit: int

Get the limit for current nbit.

property non_rfi_channel_indexes: List[int]

Get the index of channels that are not RFI excised.

property rebin_max: int

Get the maximum value after rebinning.

property rebin_offset: int

Get the offset to apply when doing rebinning.

property rfi_excised_channel_indexes: List[int]

Get the indexes of the RFI excised channels.

property scale: float

Get scale of the Gaussian distribution.

property total_sample_time: float

Get the total sample time in seconds.

property total_samples_per_channel: int

Get the total number of samples per channel.

property tsamp_secs: float

Get the TSAMP value in seconds.