Configuration Guide

Note

This page covers standalone mode. For running the pipeline as part of the end-to-end SDP production system via a Science Data Model directory, see SDM Mode Guide.

Format

The batch pre-processing pipeline application translates the configuration file into a sequence of calls to DP3, one per frequency interval of each input MeasurementSet, and execute them as subprocesses. The configuration file schema reflects this: it provides the means to specify a list of DP3 steps and their parameters.

Example

#
# Example configuration file -- non-exhaustive.
#

# Optional section: configure reading of the input data.
input:
  data_column: DATA  # default

# Madatory section: processing steps
steps:
  - step: preflagger
    # Frequency range(s) to flag, each as a {start, stop} pair in MHz
    # At least one range must be provided.
    frequency_ranges_mhz:
      - {start: 150.0, stop: 160.0}
  - step: aoflagger
    # Strategy source — specify exactly one of:
    #   strategy:
    #     kind: preset
    #     name: ska_low_sharp    # use a bundled preset
    #   strategy:
    #     kind: file
    #     path: path/to/mystrategy.lua  # use a custom file (relative paths resolve against cwd)
    strategy:
      kind: preset
      name: ska_low_sharp
    memory_max_gb: 8.0 # maximum memory use in GB. No limit if omitted.
    time_window_samples: 64  # number of timeslots flagged jointly. Deduced from memory to use if omitted.
  - step: demixer
    sky_model:
      kind: sourcedb
      # Path to local sky model file. Relative paths resolve against the current working directory.
      path: bright_sources.txt
      sources_to_subtract: ["bright_a", "bright_b"]
    demix_timestep: 4  # internal averaging factor in time when fitting bright source gains
    demix_freqstep: 8  # same, in frequency
  - step: applycal
    table:
      kind: h5parm
      # Relative paths resolve against the current working directory.
      path: bandpass.h5
  - step: applycal
    table:
      kind: h5parm
      path: polarisation.h5
  - step: averager
    timestep: 4  # averaging factor in time
    freqstep: 4  # averaging factor in frequency

# Optional section: configure writing of the output data.
output:
  tile_nchan: 64
  storage_manager:
    kind: dysco
    data_bits_per_sample: 10

Schema

The config file layout rules are:

  • There must be a steps section, which must be a list of step specifications (see below) or an empty list. An empty list corresponds to a pipeline that just copies the input data.

  • Each step is specified as a dictionary with a step key indicating the step type, followed by the step parameters:

    # Customise params
    steps:
      - step: aoflagger
        strategy:
          kind: preset
          name: ska_low_sharp
    
  • Steps are executed in the order they are specified.

  • An optional input section controls parameters for reading the input data (e.g. the data column).

  • An optional output section controls parameters for writing the output data (e.g. Dysco compression settings).

The list of available steps, along with the parameters that each will accept are descibed in Pipeline Steps. The input and output sections are documented at Input and Output respectively.

Notes on ApplyCal

DP3 can apply existing calibration solutions stored in so-called H5Parm files, which are HDF5 files following a certain schema. There are a few things to be aware of:

  • H5Parm files can store an arbitrary number of solution tables, and DP3 needs to be told which one(s) to apply.

  • The exact ApplyCal options that must be given to DP3 depend on the type of solution table to apply – there are at least 3 different cases to handle.

The caller of DP3 must therefore know precisely what is inside an H5Parm file to properly configure ApplyCal step(s). The good news is that the batch pre-processing pipeline takes care of this process; one only needs to provide the H5Parm file path to apply when specifying an ApplyCal step, via the path configuration parameter. Here are two valid examples:

steps:
  - step: applycal
    table:
      kind: h5parm
      path: /absolute/path/to/somefile.h5
steps:
  - step: applycal
    table:
      kind: h5parm
      # Relative paths are resolved against the current working directory
      path: somefile.h5

This ease of use, however, comes at the following price:

Warning

The batch pre-processing pipeline will only accept H5Parm files with a schema/layout such that there is only one possible way of applying them.

An error message will be raised if the ApplyCal configuration cannot be deduced from the contents of the H5Parm.

H5Parm restrictions

Some documentation about H5Parm and its schema can be found in the LOFAR Imaging Cookbook. The batch pre-processing pipeline enforces the following additional restrictions on the H5Parm files it accepts for its ApplyCal steps:

  • Only one solution set (solset)

  • Either 1 or 2 solution tables (soltab) in the solset.

  • Soltab types must be either “amplitude” or “phase”; the soltab type is stored in its TITLE attribute.

  • If there are 2 soltabs, they must represent amplitude and phase, and their number of polarisations must be identical.

  • If there is only 1 soltab, it can only represent the phase or amplitude part of a scalar or diagonal solution table.

Notes on Demixing

The Demixer step is one way of performing subtraction of distant bright sources.

Demixing for large bandwidths

Internally, the Demixer step fits gains to the model visibilities of each bright source, as an unconstrained Jones matrix that is allowed to vary as a function of time (see demix_timestep parameter), but not frequency, despite of the existence of a demix_freqstep parameter.

Demixer thus implicitly expects the input dataset to have a bandwidth small enough for gains to be considered uniform in frequency, including the primary beam response; for LOFAR or SKA Low, “small enough” means no more than a few MHz.

For datasets with a wider band, you will need to specify the --frequency-chunk-hz argument so that processing is split along the frequency dimension in chunks small enough for Demixing to work as advertised.

Practical details

Demixing requires a sky model in SourceDB format. SourceDB contains two types of entries:

  • Sky components, which are either points or gaussians, with various parameters such as position, flux, spectral index, but also the “patch” it belongs to.

  • So-called “patches”, which are special entries that are effectively associated with one group of sky components and one calibration direction / gain table.

Below is a basic example of SourceDB sky model to use for bright source subtraction:

FORMAT = Name, Type, Patch, Ra, Dec, I, SpectralIndex, LogarithmicSI, ReferenceFrequency

, , bright_a, 52.052625deg, -28.5875deg
, , bright_b, 53.052625deg, -27.5875deg

point_a, POINT, bright_a, 52.052625deg, -28.5875deg, 1.0, [0.0], true, 959969726.5625
point_b, POINT, bright_b, 53.052625deg, -27.5875deg, 1.0, [0.0], true, 959969726.5625

Note

We may implement a more user-friendly data schema for bright source sky models in the future.

Here is what a Demixer step configuration may look like:

steps:
  - step: demixer
    sky_model:
      kind: sourcedb
      # A relative path is resolved against the current working directory
      path: bright_sources.txt
      # List of sources to subtract, must all refer to existing "patches" in the sky model file
      sources_to_subtract: ["bright_a", "bright_b"]
    # Internal averaging factors when fitting bright source gains
    demix_timestep: 4
    demix_freqstep: 8

Please also refer to the Demixer step documentation for the full parameter reference.