Configuration Guide
Note
This page covers standalone mode. For running the pipeline as part of the end-to-end SDP production system via a Science Data Model directory, see SDM Mode Guide.
Format
The batch pre-processing pipeline application translates the configuration file into a sequence of calls to DP3, one per frequency interval of each input MeasurementSet, and execute them as subprocesses. The configuration file schema reflects this: it provides the means to specify a list of DP3 steps and their parameters.
Example
#
# Example configuration file -- non-exhaustive.
#
# Optional section: configure reading of the input data.
input:
data_column: DATA # default
# Madatory section: processing steps
steps:
- step: preflagger
# Frequency range(s) to flag, each as a {start, stop} pair in MHz
# At least one range must be provided.
frequency_ranges_mhz:
- {start: 150.0, stop: 160.0}
- step: aoflagger
# Strategy source — specify exactly one of:
# strategy:
# kind: preset
# name: ska_low_sharp # use a bundled preset
# strategy:
# kind: file
# path: path/to/mystrategy.lua # use a custom file (relative paths resolve against cwd)
strategy:
kind: preset
name: ska_low_sharp
memory_max_gb: 8.0 # maximum memory use in GB. No limit if omitted.
time_window_samples: 64 # number of timeslots flagged jointly. Deduced from memory to use if omitted.
- step: demixer
sky_model:
kind: sourcedb
# Path to local sky model file. Relative paths resolve against the current working directory.
path: bright_sources.txt
sources_to_subtract: ["bright_a", "bright_b"]
demix_timestep: 4 # internal averaging factor in time when fitting bright source gains
demix_freqstep: 8 # same, in frequency
- step: applycal
table:
kind: h5parm
# Relative paths resolve against the current working directory.
path: bandpass.h5
- step: applycal
table:
kind: h5parm
path: polarisation.h5
- step: averager
timestep: 4 # averaging factor in time
freqstep: 4 # averaging factor in frequency
# Optional section: configure writing of the output data.
output:
tile_nchan: 64
storage_manager:
kind: dysco
data_bits_per_sample: 10
Schema
The config file layout rules are:
There must be a
stepssection, which must be a list of step specifications (see below) or an empty list. An empty list corresponds to a pipeline that just copies the input data.Each step is specified as a dictionary with a
stepkey indicating the step type, followed by the step parameters:# Customise params steps: - step: aoflagger strategy: kind: preset name: ska_low_sharp
Steps are executed in the order they are specified.
An optional
inputsection controls parameters for reading the input data (e.g. the data column).An optional
outputsection controls parameters for writing the output data (e.g. Dysco compression settings).
The list of available steps, along with the parameters that each will accept are descibed in Pipeline Steps.
The input and output sections are documented at Input and Output respectively.
Notes on ApplyCal
DP3 can apply existing calibration solutions stored in so-called H5Parm files, which are HDF5 files following a certain schema. There are a few things to be aware of:
H5Parm files can store an arbitrary number of solution tables, and DP3 needs to be told which one(s) to apply.
The exact ApplyCal options that must be given to DP3 depend on the type of solution table to apply – there are at least 3 different cases to handle.
The caller of DP3 must therefore know precisely what is inside an H5Parm file
to properly configure ApplyCal step(s). The good news is that the batch
pre-processing pipeline takes care of this process; one only needs to provide
the H5Parm file path to apply when specifying an ApplyCal step, via the path
configuration parameter. Here are two valid examples:
steps: - step: applycal table: kind: h5parm path: /absolute/path/to/somefile.h5steps: - step: applycal table: kind: h5parm # Relative paths are resolved against the current working directory path: somefile.h5
This ease of use, however, comes at the following price:
Warning
The batch pre-processing pipeline will only accept H5Parm files with a schema/layout such that there is only one possible way of applying them.
An error message will be raised if the ApplyCal configuration cannot be deduced from the contents of the H5Parm.
H5Parm restrictions
Some documentation about H5Parm and its schema can be found in the LOFAR Imaging Cookbook. The batch pre-processing pipeline enforces the following additional restrictions on the H5Parm files it accepts for its ApplyCal steps:
Only one solution set (solset)
Either 1 or 2 solution tables (soltab) in the solset.
Soltab types must be either “amplitude” or “phase”; the soltab type is stored in its
TITLEattribute.If there are 2 soltabs, they must represent amplitude and phase, and their number of polarisations must be identical.
If there is only 1 soltab, it can only represent the phase or amplitude part of a scalar or diagonal solution table.
Notes on Demixing
The Demixer step is one way of performing subtraction of distant bright sources.
Demixing for large bandwidths
Internally, the Demixer step fits gains to the model visibilities of each
bright source, as an unconstrained Jones matrix that is allowed to vary as
a function of time (see demix_timestep parameter), but not frequency,
despite of the existence of a demix_freqstep parameter.
Demixer thus implicitly expects the input dataset to have a bandwidth small enough for gains to be considered uniform in frequency, including the primary beam response; for LOFAR or SKA Low, “small enough” means no more than a few MHz.
For datasets with a wider band, you will need to specify the --frequency-chunk-hz
argument so that processing is split along the frequency dimension in chunks
small enough for Demixing to work as advertised.
Practical details
Demixing requires a sky model in SourceDB format. SourceDB contains two types of entries:
Sky components, which are either points or gaussians, with various parameters such as position, flux, spectral index, but also the “patch” it belongs to.
So-called “patches”, which are special entries that are effectively associated with one group of sky components and one calibration direction / gain table.
Below is a basic example of SourceDB sky model to use for bright source subtraction:
FORMAT = Name, Type, Patch, Ra, Dec, I, SpectralIndex, LogarithmicSI, ReferenceFrequency
, , bright_a, 52.052625deg, -28.5875deg
, , bright_b, 53.052625deg, -27.5875deg
point_a, POINT, bright_a, 52.052625deg, -28.5875deg, 1.0, [0.0], true, 959969726.5625
point_b, POINT, bright_b, 53.052625deg, -27.5875deg, 1.0, [0.0], true, 959969726.5625
Note
We may implement a more user-friendly data schema for bright source sky models in the future.
Here is what a Demixer step configuration may look like:
steps: - step: demixer sky_model: kind: sourcedb # A relative path is resolved against the current working directory path: bright_sources.txt # List of sources to subtract, must all refer to existing "patches" in the sky model file sources_to_subtract: ["bright_a", "bright_b"] # Internal averaging factors when fitting bright source gains demix_timestep: 4 demix_freqstep: 8
Please also refer to the Demixer step documentation for the full parameter reference.