Additional Apps

Configuration validator

The mid-selfcal-validate-config CLI app can be used to check that a configuration file has been correctly written. The app is particularly useful to run before submitting a batch job on an HPC cluster, because otherwise the config cannot be validated until the job starts.

As an example, here is what happens when making a typo for the tesselation (facet creation) method to use in the first self-calibration cycle:

$ mid-selfcal-validate-config config/meerkat.yml

'kmeanz' is not one of ['square_grid', 'voronoi_brightest', 'kmeans']

Failed validating 'enum' in schema['properties']['selfcal_cycles']['items']['properties']['tesselation']['properties']['method']:
    {'enum': ['square_grid', 'voronoi_brightest', 'kmeans'],
    'type': 'string'}

On instance['selfcal_cycles'][0]['tesselation']['method']:
    'kmeanz'

Keep fixing mistakes and running the app until it returns quitely, at which point the configuration is valid.

WSClean FITS to PNG converter

As part of pipeline runs, WSClean produces a number of FITS images that it is interesting to visually inspect, for example to verify that the self-calibration process is converging. However, FITS files tend to be quite bulky (GBs, even tens of GBs) and take a long time to download from a remote computing facility in order to display them with dedicated software such as DS9.

To circumvent that problem, an app that converts high-resolution FITS to more manageably-sized PNG files is included in the repository and automatically installed with the pipeline Python module:

usage: mid-selfcal-fits2png [-h] [-r {sum,max}] [--zmin ZMIN] [--zmax ZMAX] files [files ...]

Convert WSClean FITS files to PNG

positional arguments:
files                 WSClean FITS files.

optional arguments:
-h, --help            show this help message and exit
-r {sum,max}, --reduction {sum,max}
                        Reduction function to apply to shrink the image size on a NxN cell-by-cell basis. (default: sum)
--zmin ZMIN           Minimum colormap value in units of the estimated background noise standard deviation. (default: -4.0)
--zmax ZMAX           Maximum colormap value in units of the estimated background noise standard deviation. (default: 10.0)

It can be called on multiple files at once as follows:

$ cd <PIPELINE_OUTPUT_DIR>
$ mid-selfcal-fits2png *.fits

For each FITS file, it will create in the same directory an identically-named image with a .png extension, with a reduced resolution of 2000 x 2000 pixels. Original-resolution images are shrunk by the appropriate integer factor N, by applying a reduction function to NxN cells (i.e. taking their sum or their max value).

The colour scale is dynamically adjusted based on a robust estimation of the background noise after shrinking. Typical run times are 10 to 60 seconds per FITS file, depending on their pixel size.

Note

Max-pooling was found to excessively enhance some otherwise invisible artifacts in some images, and can provide an overall distorted result. We have left the option for future reference, but it is highly recommended to use the “sum” reduction function.

FITS image statistics

In order to perform some basic image quality checks after pipeline runs, we provide the following command line app. Currently it only evaluates RMS and Robust RMS of the Stokes I data, but more statistics may be added later.

The app should typically be run on the Residual FITS images.

usage: mid-selfcal-image-stats [-h] files [files ...]

Compute basic FITS image statistics

positional arguments:
files       WSClean FITS files.

optional arguments:
-h, --help  show this help message and exit

Basic System Monitor

Due to lack of standard solutions, we provide an app that regularly pulls system load metrics and prints them to standard output as JSON dictionaries. Suggested usage is to launch it in the background just before a pipeline run, and to redirect its output to file:

mid-selfcal-system-monitor --interval 1.0 > system_usage.jsonl &

Each line of output is a JSON-format dictionary containing an UTC timestamp, the average load for each CPU over the last interval (1 second by default), RAM usage, traffic on disk and network interfaces.

The resulting file may be loaded into a pandas DataFrame for analysis like so:

import pandas
df = pandas.read_json("system_usage.jsonl", lines=True, convert_dates=["utc"])