Example API Usage

[1]:

%load_ext autoreload
%autoreload 2

[2]:

import pathlib
import tempfile

import numpy as np

from ska_pydada import AsciiHeader, DadaFile

Create a DADA file

The cells below show how to create a DADA file with some random data.

The steps to create a DADA file are: * create a header * set the data * serialise/dump to an output file

Create ASCII Header

A header for the DADA file format is a simple key value structure that has at least 4096 bytes. The size of the ASCII header is defined by the HDR_SIZE key and in the PyDADA library one can override this value at construction time or via setting the property later.

header = AsciiHeader(header_size=16384)

or

header = AsciiHeader()
header.header_size = 16384

in the either example above the serialised header will have exactly 16384 bytes.

A header can be created from a string or a byte array (this is what is used when the DadaFile loads a DADA file). The following will use the header_txt variable to load the header.

[3]:

header_txt = """HDR_SIZE            16384
HDR_VERSION         1.0
NCHAN               432
NBIT                32
NDIM                2
NPOL                2
RESOLUTION          1327104
UTC_START           2017-08-01-15:53:29
"""

[4]:

header = AsciiHeader.from_str(header_txt)

[5]:

assert header.header_size == 16384
assert int(header["NCHAN"]) == 432
assert int(header["NBIT"]) == 32
assert int(header["NDIM"]) == 2
assert int(header["NPOL"]) == 2
assert header.resolution == 1327104
assert header["UTC_START"] == "2017-08-01-15:53:29"

There are also utility methods on the AsciiHeader to get a header record as an int or float.

[6]:

nbit = header.get_int("NBIT")
assert nbit == 32

Values can be added to the header either using a Python dict item insert or using the set_value

[7]:

header["SOURCE"] = "J1644-4559_R"
assert header.get_value("SOURCE") == "J1644-4559_R"

# or

header.set_value("DESCRIPTION", "Some fancy description")
assert header["DESCRIPTION"] == "Some fancy description"

Generate some data

For this notebook, the data will be random complex data with 768 time bins, 432 frequency channels, and 2 polarisations.

[8]:

data = np.random.rand(768, 432, 2 * 2).astype(np.float32).view(np.complex64)
data.shape

[8]:

(768, 432, 2)

[9]:

data

[9]:

array([[[0.01555696+0.23687598j, 0.63101757+0.8273398j ],
        [0.1019997 +0.20314546j, 0.28115603+0.05395474j],
        [0.08066873+0.9387407j , 0.8852368 +0.8070221j ],
        ...,
        [0.8551784 +0.59639555j, 0.53377   +0.00310419j],
        [0.90561527+0.00324198j, 0.8135921 +0.7804704j ],
        [0.24206983+0.937976j  , 0.1009925 +0.15797824j]],

       [[0.6727332 +0.2471373j , 0.04252952+0.57505304j],
        [0.03055412+0.9722589j , 0.40800086+0.72077453j],
        [0.51640964+0.09622601j, 0.03890126+0.7105346j ],
        ...,
        [0.4281289 +0.09519663j, 0.5180816 +0.28376263j],
        [0.5364446 +0.4581596j , 0.50834155+0.20690413j],
        [0.7950088 +0.6539102j , 0.9142269 +0.32436267j]],

       [[0.23043332+0.67540294j, 0.39632943+0.9976033j ],
        [0.936405  +0.17276858j, 0.25369713+0.6106597j ],
        [0.5589724 +0.27544147j, 0.80924934+0.3692149j ],
        ...,
        [0.4675335 +0.93484634j, 0.42725256+0.17921717j],
        [0.5491084 +0.9029626j , 0.37587547+0.15243983j],
        [0.78585273+0.5865378j , 0.7718368 +0.3480341j ]],

       ...,

       [[0.24105   +0.92472416j, 0.7758283 +0.29101568j],
        [0.39365798+0.7450142j , 0.51648504+0.92651063j],
        [0.51292735+0.4928586j , 0.19067194+0.2987279j ],
        ...,
        [0.2932781 +0.42749766j, 0.77678394+0.14476013j],
        [0.8524298 +0.71833193j, 0.3436375 +0.13819219j],
        [0.8148442 +0.9699545j , 0.08060528+0.86016834j]],

       [[0.23590396+0.69152284j, 0.6515486 +0.5737039j ],
        [0.26653144+0.36185992j, 0.21904516+0.9724443j ],
        [0.10512716+0.7612819j , 0.9996518 +0.43186972j],
        ...,
        [0.8888344 +0.67638975j, 0.8803412 +0.809626j  ],
        [0.0978892 +0.01596169j, 0.17546682+0.11761183j],
        [0.96782213+0.38979536j, 0.27645203+0.73331845j]],

       [[0.13231432+0.01746596j, 0.69402665+0.75289327j],
        [0.21923342+0.20927563j, 0.7371151 +0.9990676j ],
        [0.79510385+0.21576184j, 0.22997381+0.21058026j],
        ...,
        [0.717941  +0.02027212j, 0.3551926 +0.17843196j],
        [0.14591125+0.7119099j , 0.95926803+0.23967971j],
        [0.7973055 +0.5685653j , 0.6806729 +0.99005854j]]],
      dtype=complex64)

An instance of a DadaFile can be created using the constructor that takes an optional AsciiHeader and an optional by array of data.

However, the following show how to create a DadaFile and then set the data afterwards.

[10]:

dada_file = DadaFile(header=header)
dada_file.set_data(data)

Once an instance of a DadaFile has been created, it can be saved as a file. This can be then be read back later

[11]:

assert dada_file.data_size == len(data.tobytes())
dada_file.data_size

[11]:

[12]:

tmpdir = tempfile.gettempdir()
outfile = pathlib.Path(tmpdir) / "example_dada_file.dada"

dada_file.dump(outfile)

[13]:

%ls -lh $outfile

-rw-rw-r-- 1 wgauvin wgauvin 5.1M Mar 25 10:51 /tmp/example_dada_file.dada

[14]:

!head -c4096 $outfile

HDR_SIZE            16384
HDR_VERSION         1.0
NCHAN               432
NBIT                32
NDIM                2
NPOL                2
RESOLUTION          1327104
UTC_START           2017-08-01-15:53:29
SOURCE              J1644-4559_R
DESCRIPTION         Some fancy description

Loading and reading DADA files

While the above shows how to create DADA files that is normally used for testing and is not the main focus of the PyDADA library or DadaFile itself. The power of the DadaFile is that it can read DADA files that conform to the DADA spec of having a header that has HDR_SIZE value of at least 4096 bytes and that the binary data is afterwards. As it is a flexible file format the data may be packed in different ways and this API provides general data access methods to get the data.

The following cell will read the file generated above and print out the header.

[15]:

dada_file2 = DadaFile.load_from_file(outfile)
print(dada_file2.header)

HDR_SIZE            16384
HDR_VERSION         1.0
NCHAN               432
NBIT                32
NDIM                2
NPOL                2
RESOLUTION          1327104
UTC_START           2017-08-01-15:53:29
SOURCE              J1644-4559_R
DESCRIPTION         Some fancy description

To get the data in time, frequency and polarisation structure, once can use the as_time_freq_pol help method. There is no need to know how the data is layed out provided the header records are correct.

[16]:

tfp_data = dada_file2.as_time_freq_pol()
tfp_data

[16]:

array([[[0.01555696+0.23687598j, 0.63101757+0.8273398j ],
        [0.1019997 +0.20314546j, 0.28115603+0.05395474j],
        [0.08066873+0.9387407j , 0.8852368 +0.8070221j ],
        ...,
        [0.8551784 +0.59639555j, 0.53377   +0.00310419j],
        [0.90561527+0.00324198j, 0.8135921 +0.7804704j ],
        [0.24206983+0.937976j  , 0.1009925 +0.15797824j]],

       [[0.6727332 +0.2471373j , 0.04252952+0.57505304j],
        [0.03055412+0.9722589j , 0.40800086+0.72077453j],
        [0.51640964+0.09622601j, 0.03890126+0.7105346j ],
        ...,
        [0.4281289 +0.09519663j, 0.5180816 +0.28376263j],
        [0.5364446 +0.4581596j , 0.50834155+0.20690413j],
        [0.7950088 +0.6539102j , 0.9142269 +0.32436267j]],

       [[0.23043332+0.67540294j, 0.39632943+0.9976033j ],
        [0.936405  +0.17276858j, 0.25369713+0.6106597j ],
        [0.5589724 +0.27544147j, 0.80924934+0.3692149j ],
        ...,
        [0.4675335 +0.93484634j, 0.42725256+0.17921717j],
        [0.5491084 +0.9029626j , 0.37587547+0.15243983j],
        [0.78585273+0.5865378j , 0.7718368 +0.3480341j ]],

       ...,

       [[0.24105   +0.92472416j, 0.7758283 +0.29101568j],
        [0.39365798+0.7450142j , 0.51648504+0.92651063j],
        [0.51292735+0.4928586j , 0.19067194+0.2987279j ],
        ...,
        [0.2932781 +0.42749766j, 0.77678394+0.14476013j],
        [0.8524298 +0.71833193j, 0.3436375 +0.13819219j],
        [0.8148442 +0.9699545j , 0.08060528+0.86016834j]],

       [[0.23590396+0.69152284j, 0.6515486 +0.5737039j ],
        [0.26653144+0.36185992j, 0.21904516+0.9724443j ],
        [0.10512716+0.7612819j , 0.9996518 +0.43186972j],
        ...,
        [0.8888344 +0.67638975j, 0.8803412 +0.809626j  ],
        [0.0978892 +0.01596169j, 0.17546682+0.11761183j],
        [0.96782213+0.38979536j, 0.27645203+0.73331845j]],

       [[0.13231432+0.01746596j, 0.69402665+0.75289327j],
        [0.21923342+0.20927563j, 0.7371151 +0.9990676j ],
        [0.79510385+0.21576184j, 0.22997381+0.21058026j],
        ...,
        [0.717941  +0.02027212j, 0.3551926 +0.17843196j],
        [0.14591125+0.7119099j , 0.95926803+0.23967971j],
        [0.7973055 +0.5685653j , 0.6806729 +0.99005854j]]],
      dtype=complex64)

[17]:

np.testing.assert_allclose(tfp_data, data)

The TFP data can also be retrieved by using the data_c64 method one should include the shape.

[18]:

tfp_data2 = dada_file2.data_c64(shape=(-1, 432, 2))
tfp_data2.shape

[18]:

(768, 432, 2)

[19]:

np.testing.assert_allclose(tfp_data2, data)

The raw data can be retrieved from the file by using:

[20]:

raw_data = dada_file2.raw_data
len(raw_data), raw_data[:20]

[20]:

(5308416, b'\x9e\xe2~<\x9e\x8fr>^\x8a!?\x8b\xccS?8\xe5\xd0=')

Large files

This notebook as uses small files but data recorded during a scan can result in large files. Loading the whole file into memory is not efficient and the DadaFile defaults to only loading data of around 4MB at a time, the amount of data loaded will be equal to a multiple of RESOLUTION value defined in the header (this does default to 1 byte if not set).

The below shows using the load_next method to get the next lot of data.

[21]:

data = np.random.rand(768 * 2, 432, 2 * 2).astype(np.float32).view(np.complex64)
data.shape

dada_file.set_data(data)
dada_file.dump(outfile)

/tmp/example_dada_file.dada already exists, overwriting it

[22]:

%ls -lh $outfile

-rw-rw-r-- 1 wgauvin wgauvin 11M Mar 25 10:51 /tmp/example_dada_file.dada

[23]:

dada_file3 = DadaFile.load_from_file(outfile)
len(dada_file3.raw_data)

[23]:

[24]:

raw_data1 = dada_file3.raw_data
bytes_read = dada_file3.load_next()
assert bytes_read == 5308416
raw_data2 = dada_file3.raw_data

np.testing.assert_raises(AssertionError, np.testing.assert_array_equal, raw_data1, raw_data2)

bytes_read = dada_file3.load_next()
assert bytes_read == 0
raw_data3 = dada_file3.raw_data
np.testing.assert_array_equal(raw_data2, raw_data3)

[25]:

raw_data1[:10], raw_data2[:10], raw_data3[:10]

[25]:

(b'.\x04\x1e?\xe7&C>\xb6\x1a',
 b'\xf7\xcc1?0\xd0\x98>\xab\x0e',
 b'\xf7\xcc1?0\xd0\x98>\xab\x0e')

From the above raw data we see that first call to load_next returns new data but the next call doesn’t.

Methods like the as_time_freq_pol can still be used after a read but it is on the latest chuck of data.

[26]:

tfp = dada_file3.as_time_freq_pol()
tfp

[26]:

array([[[0.69453377+0.29846334j, 0.5861613 +0.50258344j],
        [0.37851405+0.3749288j , 0.28711602+0.8315913j ],
        [0.8064918 +0.85903126j, 0.20403107+0.37458396j],
        ...,
        [0.8427735 +0.35314775j, 0.13009308+0.45534855j],
        [0.8641009 +0.8563348j , 0.78916264+0.8750467j ],
        [0.7101122 +0.46026212j, 0.76706254+0.51261145j]],

       [[0.94733924+0.19782656j, 0.514437  +0.2172059j ],
        [0.21516843+0.4936212j , 0.61411285+0.95985067j],
        [0.9173029 +0.14126436j, 0.00274928+0.66256714j],
        ...,
        [0.805473  +0.72366935j, 0.8077107 +0.1880774j ],
        [0.3038191 +0.46847492j, 0.36554307+0.9439512j ],
        [0.2829097 +0.90526193j, 0.27263778+0.6592721j ]],

       [[0.1902145 +0.7245949j , 0.1978562 +0.325831j  ],
        [0.419614  +0.93810076j, 0.18919325+0.80643815j],
        [0.38993955+0.8166182j , 0.61703146+0.9275243j ],
        ...,
        [0.33782417+0.07703307j, 0.5458486 +0.13745067j],
        [0.8238145 +0.9862099j , 0.6129758 +0.77891475j],
        [0.61657095+0.29305455j, 0.08281787+0.35198057j]],

       ...,

       [[0.740553  +0.6900841j , 0.14596154+0.7162393j ],
        [0.07717139+0.73173785j, 0.4075596 +0.42359686j],
        [0.95811224+0.97947675j, 0.1542819 +0.7075732j ],
        ...,
        [0.12443139+0.10423309j, 0.846811  +0.03134033j],
        [0.06383804+0.01260035j, 0.10976174+0.04258004j],
        [0.65133905+0.5808699j , 0.6018511 +0.5623456j ]],

       [[0.10854331+0.14797916j, 0.06071953+0.91665494j],
        [0.34130985+0.4659123j , 0.7682713 +0.94368875j],
        [0.4617567 +0.83301294j, 0.8784665 +0.8725095j ],
        ...,
        [0.32816905+0.8514784j , 0.23627369+0.5015785j ],
        [0.26620618+0.5420366j , 0.61344236+0.7404025j ],
        [0.30549553+0.66280407j, 0.79712653+0.16623533j]],

       [[0.90144914+0.5880942j , 0.7333513 +0.97613573j],
        [0.36075968+0.94765747j, 0.83162993+0.5923031j ],
        [0.0225627 +0.18140844j, 0.131302  +0.5645181j ],
        ...,
        [0.13489321+0.5341274j , 0.89829594+0.01415773j],
        [0.6237211 +0.7600323j , 0.865388  +0.86514544j],
        [0.49946168+0.2911637j , 0.80225533+0.3173227j ]]],
      dtype=complex64)

Note that the length of the raw dada is just over 5MB of data (it is 4 * RESOLUTION). However, the file is around 11MB in size. More data can be loaded and processed by using the load_next

Clean up

[27]:

if outfile.exists():
    outfile.unlink()