Example API Usage
[1]:
%load_ext autoreload
%autoreload 2
[2]:
import pathlib
import tempfile
import numpy as np
from ska_pydada import AsciiHeader, DadaFile
Create a DADA file
The cells below show how to create a DADA file with some random data.
The steps to create a DADA file are: * create a header * set the data * serialise/dump to an output file
Create ASCII Header
A header for the DADA file format is a simple key value structure that has at least 4096 bytes. The size of the ASCII header is defined by the HDR_SIZE
key and in the PyDADA library one can override this value at construction time or via setting the property later.
header = AsciiHeader(header_size=16384)
or
header = AsciiHeader()
header.header_size = 16384
in the either example above the serialised header will have exactly 16384 bytes.
A header can be created from a string or a byte array (this is what is used when the DadaFile
loads a DADA file). The following will use the header_txt
variable to load the header.
[3]:
header_txt = """HDR_SIZE 16384
HDR_VERSION 1.0
NCHAN 432
NBIT 32
NDIM 2
NPOL 2
RESOLUTION 1327104
UTC_START 2017-08-01-15:53:29
"""
[4]:
header = AsciiHeader.from_str(header_txt)
[5]:
assert header.header_size == 16384
assert int(header["NCHAN"]) == 432
assert int(header["NBIT"]) == 32
assert int(header["NDIM"]) == 2
assert int(header["NPOL"]) == 2
assert header.resolution == 1327104
assert header["UTC_START"] == "2017-08-01-15:53:29"
There are also utility methods on the AsciiHeader
to get a header record as an int
or float
.
[6]:
nbit = header.get_int("NBIT")
assert nbit == 32
Values can be added to the header either using a Python dict
item insert or using the set_value
[7]:
header["SOURCE"] = "J1644-4559_R"
assert header.get_value("SOURCE") == "J1644-4559_R"
# or
header.set_value("DESCRIPTION", "Some fancy description")
assert header["DESCRIPTION"] == "Some fancy description"
Generate some data
For this notebook, the data will be random complex data with 768 time bins, 432 frequency channels, and 2 polarisations.
[8]:
data = np.random.rand(768, 432, 2 * 2).astype(np.float32).view(np.complex64)
data.shape
[8]:
(768, 432, 2)
[9]:
data
[9]:
array([[[0.01555696+0.23687598j, 0.63101757+0.8273398j ],
[0.1019997 +0.20314546j, 0.28115603+0.05395474j],
[0.08066873+0.9387407j , 0.8852368 +0.8070221j ],
...,
[0.8551784 +0.59639555j, 0.53377 +0.00310419j],
[0.90561527+0.00324198j, 0.8135921 +0.7804704j ],
[0.24206983+0.937976j , 0.1009925 +0.15797824j]],
[[0.6727332 +0.2471373j , 0.04252952+0.57505304j],
[0.03055412+0.9722589j , 0.40800086+0.72077453j],
[0.51640964+0.09622601j, 0.03890126+0.7105346j ],
...,
[0.4281289 +0.09519663j, 0.5180816 +0.28376263j],
[0.5364446 +0.4581596j , 0.50834155+0.20690413j],
[0.7950088 +0.6539102j , 0.9142269 +0.32436267j]],
[[0.23043332+0.67540294j, 0.39632943+0.9976033j ],
[0.936405 +0.17276858j, 0.25369713+0.6106597j ],
[0.5589724 +0.27544147j, 0.80924934+0.3692149j ],
...,
[0.4675335 +0.93484634j, 0.42725256+0.17921717j],
[0.5491084 +0.9029626j , 0.37587547+0.15243983j],
[0.78585273+0.5865378j , 0.7718368 +0.3480341j ]],
...,
[[0.24105 +0.92472416j, 0.7758283 +0.29101568j],
[0.39365798+0.7450142j , 0.51648504+0.92651063j],
[0.51292735+0.4928586j , 0.19067194+0.2987279j ],
...,
[0.2932781 +0.42749766j, 0.77678394+0.14476013j],
[0.8524298 +0.71833193j, 0.3436375 +0.13819219j],
[0.8148442 +0.9699545j , 0.08060528+0.86016834j]],
[[0.23590396+0.69152284j, 0.6515486 +0.5737039j ],
[0.26653144+0.36185992j, 0.21904516+0.9724443j ],
[0.10512716+0.7612819j , 0.9996518 +0.43186972j],
...,
[0.8888344 +0.67638975j, 0.8803412 +0.809626j ],
[0.0978892 +0.01596169j, 0.17546682+0.11761183j],
[0.96782213+0.38979536j, 0.27645203+0.73331845j]],
[[0.13231432+0.01746596j, 0.69402665+0.75289327j],
[0.21923342+0.20927563j, 0.7371151 +0.9990676j ],
[0.79510385+0.21576184j, 0.22997381+0.21058026j],
...,
[0.717941 +0.02027212j, 0.3551926 +0.17843196j],
[0.14591125+0.7119099j , 0.95926803+0.23967971j],
[0.7973055 +0.5685653j , 0.6806729 +0.99005854j]]],
dtype=complex64)
An instance of a DadaFile
can be created using the constructor that takes an optional AsciiHeader
and an optional by array of data.
However, the following show how to create a DadaFile
and then set the data afterwards.
[10]:
dada_file = DadaFile(header=header)
dada_file.set_data(data)
Once an instance of a DadaFile
has been created, it can be saved as a file. This can be then be read back later
[11]:
assert dada_file.data_size == len(data.tobytes())
dada_file.data_size
[11]:
5308416
[12]:
tmpdir = tempfile.gettempdir()
outfile = pathlib.Path(tmpdir) / "example_dada_file.dada"
dada_file.dump(outfile)
[13]:
%ls -lh $outfile
-rw-rw-r-- 1 wgauvin wgauvin 5.1M Mar 25 10:51 /tmp/example_dada_file.dada
[14]:
!head -c4096 $outfile
HDR_SIZE 16384
HDR_VERSION 1.0
NCHAN 432
NBIT 32
NDIM 2
NPOL 2
RESOLUTION 1327104
UTC_START 2017-08-01-15:53:29
SOURCE J1644-4559_R
DESCRIPTION Some fancy description
Loading and reading DADA files
While the above shows how to create DADA files that is normally used for testing and is not the main focus of the PyDADA library or DadaFile
itself. The power of the DadaFile
is that it can read DADA files that conform to the DADA spec of having a header that has HDR_SIZE
value of at least 4096 bytes and that the binary data is afterwards. As it is a flexible file format the data may be packed in different ways and this API provides general data access methods to get the data.
The following cell will read the file generated above and print out the header.
[15]:
dada_file2 = DadaFile.load_from_file(outfile)
print(dada_file2.header)
HDR_SIZE 16384
HDR_VERSION 1.0
NCHAN 432
NBIT 32
NDIM 2
NPOL 2
RESOLUTION 1327104
UTC_START 2017-08-01-15:53:29
SOURCE J1644-4559_R
DESCRIPTION Some fancy description
To get the data in time, frequency and polarisation structure, once can use the as_time_freq_pol
help method. There is no need to know how the data is layed out provided the header records are correct.
[16]:
tfp_data = dada_file2.as_time_freq_pol()
tfp_data
[16]:
array([[[0.01555696+0.23687598j, 0.63101757+0.8273398j ],
[0.1019997 +0.20314546j, 0.28115603+0.05395474j],
[0.08066873+0.9387407j , 0.8852368 +0.8070221j ],
...,
[0.8551784 +0.59639555j, 0.53377 +0.00310419j],
[0.90561527+0.00324198j, 0.8135921 +0.7804704j ],
[0.24206983+0.937976j , 0.1009925 +0.15797824j]],
[[0.6727332 +0.2471373j , 0.04252952+0.57505304j],
[0.03055412+0.9722589j , 0.40800086+0.72077453j],
[0.51640964+0.09622601j, 0.03890126+0.7105346j ],
...,
[0.4281289 +0.09519663j, 0.5180816 +0.28376263j],
[0.5364446 +0.4581596j , 0.50834155+0.20690413j],
[0.7950088 +0.6539102j , 0.9142269 +0.32436267j]],
[[0.23043332+0.67540294j, 0.39632943+0.9976033j ],
[0.936405 +0.17276858j, 0.25369713+0.6106597j ],
[0.5589724 +0.27544147j, 0.80924934+0.3692149j ],
...,
[0.4675335 +0.93484634j, 0.42725256+0.17921717j],
[0.5491084 +0.9029626j , 0.37587547+0.15243983j],
[0.78585273+0.5865378j , 0.7718368 +0.3480341j ]],
...,
[[0.24105 +0.92472416j, 0.7758283 +0.29101568j],
[0.39365798+0.7450142j , 0.51648504+0.92651063j],
[0.51292735+0.4928586j , 0.19067194+0.2987279j ],
...,
[0.2932781 +0.42749766j, 0.77678394+0.14476013j],
[0.8524298 +0.71833193j, 0.3436375 +0.13819219j],
[0.8148442 +0.9699545j , 0.08060528+0.86016834j]],
[[0.23590396+0.69152284j, 0.6515486 +0.5737039j ],
[0.26653144+0.36185992j, 0.21904516+0.9724443j ],
[0.10512716+0.7612819j , 0.9996518 +0.43186972j],
...,
[0.8888344 +0.67638975j, 0.8803412 +0.809626j ],
[0.0978892 +0.01596169j, 0.17546682+0.11761183j],
[0.96782213+0.38979536j, 0.27645203+0.73331845j]],
[[0.13231432+0.01746596j, 0.69402665+0.75289327j],
[0.21923342+0.20927563j, 0.7371151 +0.9990676j ],
[0.79510385+0.21576184j, 0.22997381+0.21058026j],
...,
[0.717941 +0.02027212j, 0.3551926 +0.17843196j],
[0.14591125+0.7119099j , 0.95926803+0.23967971j],
[0.7973055 +0.5685653j , 0.6806729 +0.99005854j]]],
dtype=complex64)
[17]:
np.testing.assert_allclose(tfp_data, data)
The TFP data can also be retrieved by using the data_c64
method one should include the shape.
[18]:
tfp_data2 = dada_file2.data_c64(shape=(-1, 432, 2))
tfp_data2.shape
[18]:
(768, 432, 2)
[19]:
np.testing.assert_allclose(tfp_data2, data)
The raw data can be retrieved from the file by using:
[20]:
raw_data = dada_file2.raw_data
len(raw_data), raw_data[:20]
[20]:
(5308416, b'\x9e\xe2~<\x9e\x8fr>^\x8a!?\x8b\xccS?8\xe5\xd0=')
Large files
This notebook as uses small files but data recorded during a scan can result in large files. Loading the whole file into memory is not efficient and the DadaFile
defaults to only loading data of around 4MB at a time, the amount of data loaded will be equal to a multiple of RESOLUTION
value defined in the header (this does default to 1 byte if not set).
The below shows using the load_next
method to get the next lot of data.
[21]:
data = np.random.rand(768 * 2, 432, 2 * 2).astype(np.float32).view(np.complex64)
data.shape
dada_file.set_data(data)
dada_file.dump(outfile)
/tmp/example_dada_file.dada already exists, overwriting it
[22]:
%ls -lh $outfile
-rw-rw-r-- 1 wgauvin wgauvin 11M Mar 25 10:51 /tmp/example_dada_file.dada
[23]:
dada_file3 = DadaFile.load_from_file(outfile)
len(dada_file3.raw_data)
[23]:
5308416
[24]:
raw_data1 = dada_file3.raw_data
bytes_read = dada_file3.load_next()
assert bytes_read == 5308416
raw_data2 = dada_file3.raw_data
np.testing.assert_raises(AssertionError, np.testing.assert_array_equal, raw_data1, raw_data2)
bytes_read = dada_file3.load_next()
assert bytes_read == 0
raw_data3 = dada_file3.raw_data
np.testing.assert_array_equal(raw_data2, raw_data3)
[25]:
raw_data1[:10], raw_data2[:10], raw_data3[:10]
[25]:
(b'.\x04\x1e?\xe7&C>\xb6\x1a',
b'\xf7\xcc1?0\xd0\x98>\xab\x0e',
b'\xf7\xcc1?0\xd0\x98>\xab\x0e')
From the above raw data we see that first call to load_next
returns new data but the next call doesn’t.
Methods like the as_time_freq_pol
can still be used after a read but it is on the latest chuck of data.
[26]:
tfp = dada_file3.as_time_freq_pol()
tfp
[26]:
array([[[0.69453377+0.29846334j, 0.5861613 +0.50258344j],
[0.37851405+0.3749288j , 0.28711602+0.8315913j ],
[0.8064918 +0.85903126j, 0.20403107+0.37458396j],
...,
[0.8427735 +0.35314775j, 0.13009308+0.45534855j],
[0.8641009 +0.8563348j , 0.78916264+0.8750467j ],
[0.7101122 +0.46026212j, 0.76706254+0.51261145j]],
[[0.94733924+0.19782656j, 0.514437 +0.2172059j ],
[0.21516843+0.4936212j , 0.61411285+0.95985067j],
[0.9173029 +0.14126436j, 0.00274928+0.66256714j],
...,
[0.805473 +0.72366935j, 0.8077107 +0.1880774j ],
[0.3038191 +0.46847492j, 0.36554307+0.9439512j ],
[0.2829097 +0.90526193j, 0.27263778+0.6592721j ]],
[[0.1902145 +0.7245949j , 0.1978562 +0.325831j ],
[0.419614 +0.93810076j, 0.18919325+0.80643815j],
[0.38993955+0.8166182j , 0.61703146+0.9275243j ],
...,
[0.33782417+0.07703307j, 0.5458486 +0.13745067j],
[0.8238145 +0.9862099j , 0.6129758 +0.77891475j],
[0.61657095+0.29305455j, 0.08281787+0.35198057j]],
...,
[[0.740553 +0.6900841j , 0.14596154+0.7162393j ],
[0.07717139+0.73173785j, 0.4075596 +0.42359686j],
[0.95811224+0.97947675j, 0.1542819 +0.7075732j ],
...,
[0.12443139+0.10423309j, 0.846811 +0.03134033j],
[0.06383804+0.01260035j, 0.10976174+0.04258004j],
[0.65133905+0.5808699j , 0.6018511 +0.5623456j ]],
[[0.10854331+0.14797916j, 0.06071953+0.91665494j],
[0.34130985+0.4659123j , 0.7682713 +0.94368875j],
[0.4617567 +0.83301294j, 0.8784665 +0.8725095j ],
...,
[0.32816905+0.8514784j , 0.23627369+0.5015785j ],
[0.26620618+0.5420366j , 0.61344236+0.7404025j ],
[0.30549553+0.66280407j, 0.79712653+0.16623533j]],
[[0.90144914+0.5880942j , 0.7333513 +0.97613573j],
[0.36075968+0.94765747j, 0.83162993+0.5923031j ],
[0.0225627 +0.18140844j, 0.131302 +0.5645181j ],
...,
[0.13489321+0.5341274j , 0.89829594+0.01415773j],
[0.6237211 +0.7600323j , 0.865388 +0.86514544j],
[0.49946168+0.2911637j , 0.80225533+0.3173227j ]]],
dtype=complex64)
Note that the length of the raw dada is just over 5MB of data (it is 4 * RESOLUTION
). However, the file is around 11MB in size. More data can be loaded and processed by using the load_next
Clean up
[27]:
if outfile.exists():
outfile.unlink()