Changelog

Development

3.0.0 - 2026-06-09

Added

  • AOFlagger strategy presets. It is now possible to choose a bundled strategy file via kind: preset / name: <preset_name>. The available presets are described in the documentation pages. It is still possible to supply a custom strategy file via kind: file / path: <path>.

Changed

  • Multiple improvements and breaking changes to the configuration syntax. Refer to the documentation.

  • Pipeline is now launched via the run subcommand of the CLI (ska-sdp-batch-preprocess run ...).

Removed

  • --extra-inputs-dir / -e CLI argument. Relative paths in configuration are now resolved against the current working directory. This option is now effectively superseded by --sdm-path for procedurally fetching extra inputs.

2.10.1 - 2026-04-28

Added

  • Documentation for SDM mode.

  • Applycal step can now be run in SDM mode the using field_id and calibration_purpose fields instead of parmdb. This syntax is provisional and will change in the upcoming 3.0 release.

2.10.0 - 2026-04-24

Exclude autocorrelations from flagging reports, partial implementation of Science Data Model (SDM) mode, CLI apps built with typer.

Added

  • ska-sdp-flagging-report: New -a / --include-autocorrelations flag to opt in to including autocorrelations in the report statistics.

  • Added --sdm-path CLI argument to run the pipeline in SDM mode. When provided, logs, the input config file and QA products are written into the standard location in the SDM directory; the option is mutually exclusive with --extra-inputs-dir. NOTE: Applycal and Demixer cannot be run in SDM mode yet.

Changed

  • Autocorrelations are now excluded from flagging report statistics by default.

  • Missing baselines in the per-baseline flagging matrix now render as white instead of 0% flagged.

  • Both CLI apps (ska-sdp-batch-preprocess and ska-sdp-flagging-report) now use the typer library instead of argparse.

2.9.0 - 2026-04-10

Enhanced RFI flagging report with new plots and an improved layout. Standalone flagging report CLI app.

Added

  • ska-sdp-flagging-report: Standalone CLI app to generate flagging report and associated plot

  • Flagging report plots now include a time-frequency heatmap and a flagged fraction by baseline length plot.

  • Flagging report plots now use an improved, SKAO-themed layout.

Changed

  • The flagging report now stores flag sums and sample counts instead of flagged fractions.

2.8.0 - 2026-03-26

Deleted XBPP, refactored configuration system in preparation for adding Science Data Model interface.

Changed

  • Restricted which DP3 parameters are available for the user to set in the configuration file.

  • More strict validation of step parameters via pydantic.

Removed

  • The msout.overwrite option has been disabled; it had been made irrelevant by requiring the output directory to contain no measurement sets.

  • Removed xradio batch pre-processing application and all dependent code.

2.7.3 - 2026-03-11

Changed

  • Made the output directory emptiness check made on startup less stringent; the only requirement is that it should not contain subdirectories with a .ms extension.

2.7.2 - 2026-02-05

Added

  • Flagging report calculation in MSv2 pipeline is now distributed over dask workers.

  • Documentation has been largely re-written and now contains quickstart and AWS tutorials.

Fixed

  • AWS “user” script now takes care of it of setting up MODULEPATH to include the directory containing spack metamodules.

Changed

  • Both pipeline applications now require the output directory to be empty.

  • Depend on xradio 1.1.0 and on the final xradio 1.0 data schema. Xradio processing sets created with xradio 0.x may not be compatible with the MSv4 pipeline anymore.

2.7.1 - 2026-01-22

Fixed

  • Added astropy to the base dependency group, which was missing in the previous release.

2.7.0 - 2026-01-21

Added initial version of the RFI flagging report and associated plot to the MSv2 pipeline.

Added

  • MSv2 pipeline now outputs an RFI flagging report and associated plot for each output MS, based on the contents of its FLAG column.

  • Pipeline logs are now written to <OUTPUT_DIR>/pipeline.log.

  • It is now possible to change the logging format via the environment variable SKA_BPP_LOGGING_FORMAT. If unset, the SKAO default format is used.

  • Additional SLURM scripts for AWS in scripts/

Fixed

  • MSv2 pipeline now checks the output MS path does not exist. Before, there was an edge case where, if the desired output MS path already existed as a directory (e.g. because of a previous pipeline run), the output MS would be written as a sub-directory of it.

2.6.0 - 2025-10-04

Added frequency distribution, mainly to enable Demixing on large bandwidths. Pipeline configuration checks have been made stricter.

Added

  • MSv2 pipeline can now perform frequency distribution, by splitting the processing of every input MS in frequency in equally-sized chunks. Added --frequency-chunk-hz CLI app argument. This was added to make Demixing work on large bandwidths.

  • MSv4 pipeline now distributes the work along both time and frequency chunks. The frequency chunk sizes must respect some constraints that are checked on startup, i.e. be divisible by the frequency averaging factor of the pipeline and demixfreqstep.

  • MSv2 pipeline now also saves a dask HTML report.

  • Both MSv2 and MSv4 pipelines now log the configuration upon starting, and also save it to <OUTPUT_DIR>/config.yaml for future reference.

  • Additional parameter checks for Demixer steps

Changed

  • Data selection parameters in the msin (aka. Input) step of DP3 are now forbidden, because they are not compatible with the distributed processing model of the MSv4 pipeline. Such data selection would be replaced by using the .sel or .isel methods on most xarray objects.

  • The Preflagger parameters chan and timeslot are now forbidden as well, because data ranges selected using indices would refer to different data ranges once the configuration is passed to a DP3 instance that only processes one chunk of the data. The abstime and freqrange parameters are safe to use instead.

  • The Demixer parameters demixtimeresolution and demixfreqresolution are now forbidden to facilitate the additional parameter validation required by frequency distribution. Please use demixtimestep and demixfreqstep only.

  • The Averager parameters timeresolution and freqresolution are now forbidden for the same reason. Please use timestep and freqstep only.

Fixed

  • Both MSv2 and MSv4 pipelines now check that the dask workers have the process resource available and raise an error immediately otherwise. Previously, the workers would simply hang indefinitely.

Removed

  • It is not necessary anymore to have the DP3 executable installed; instead, we now depend only on the dp3 Python package.

2.5.4 - 2025-07-23

Streamlined dependency specifications.

Changed

  • Allow numpy 2.x again, stop caring about the SKA SDP spack environment in pyproject.toml.

2.5.3 - 2025-07-01

Changed

  • Depend on numpy 1.26, as depending on numpy 2.x created too many problems in the SKA spack environment.

  • Revert previous change: xradio pipeline will always save the HTML report.

2.5.2 - 2025-07-01

Changed

  • In ska-sdp-batch-preprocess-xradio, only save the HTML dask report if the bokeh package is installed. While bokeh is a mandatory dependency in pyproject.toml, we will treat it as an optional dependency in the SKA spack environment to avoid conflicts in the required numpy version (the official bokeh spack recipe requires numpy 1.x).

Fixed

  • Fixed list of mandatory dependencies in pyproject.toml

2.5.1 - 2025-06-30

Added

  • DP3 step timings are now parsed and logged for each task. At the end of the run, the overall median step timings are also logged.

Fixed

  • Fixed an issue where the dask workers would run out of RAM because the dask scheduler attempts to load too much input data in advance. Tasks are now submitted progressively on the client side, such that the number of concurrently scheduled tasks remains below a reasonable maximum (1.5 x num_workers).

2.5.0 - 2025-06-20

Added new batch-preprocessing pipeline app that can process xradio datasets.

Added

  • Added CLI app ska-sdp-batch-preprocess-xradio, which ingests an xradio processing set and writes out multiple Measurement Sets, one per input time chunk.

Changed

  • Now using SKAO standard logging format

Fixed

  • The ska-sdp-batch-preprocess app will now raise an exception in distributed mode if any processing task failed.

2.4.0 - 2025-04-15

Added bright source subtraction. Demixer step is now available to run.

Added

  • Ability to run Demixer step

2.3.0 - 2025-03-19

CLI interface change in preparation of adding Demixing.

Changed

  • Renamed CLI argument --solutions-dir to --extra-inputs-dir

2.2.0 - 2025-02-04

Added option to use dask distribution.

Added

  • Ability to distribute over multiple dask workers, where one input MS corresponds to one task. CLI app now accepts an optional --dask-scheduler argument followed by the network address of the scheduler to use.

2.1.1 - 2025-01-31

Improved H5Parm validator.

Added

  • H5Parm validator is now much stricter and will raise a helpful error message even on unlikely corner cases.

2.1.0 - 2025-01-29

ApplyCal step is now available to run.

Added

  • Ability to run ApplyCal step

  • CLI arguments are now grouped into “required” and “optional” groups, which makes the batch pre-processing app help text clearer.

Fixed

  • The pipeline now checks that there are no duplicate input MeasurementSet names; if there are, it raises an error. This is necessary because two input paths dir1/data.ms and dir2/data.ms correspond to the same pre-processed output path. Previously, such a situation would have resulted in either a crash or some output measurement sets being overwritten.

2.0.0 - 2025-01-20

Preliminary release following a complete rewrite. The batch pre-processing pipeline now wraps DP3.

Added

  • Command line app can now be called via the command ska-sdp-batch-preprocess

  • Ability to run PreFlagger step

  • Ability to run AOFlagger step

Changed

  • New command line interface

  • New configuration file format, where step names and options map (almost) directly to what DP3 expects.

Removed

  • Support for MSv4

  • Distribution with dask; will be added back in an upcoming release

1.0.1 - 2024-11-29

Dummy release that was required to comply with the SKAO release process.

1.0.0 - 2024-11-05

Major release with the following additions:

  • Bug fixed: MSv2 output did not capture all changes conducted by the requested chain of processing functions.

  • New classmethod introduced to MeasurementSet allowing instances to be called directly with list[Visibility] inputs.

  • Relevant improvements to documentation.

Progress:

  • Bug fixed, enabling the output of a given processing function in the chain to be correctly passed into the next function [MR20].

  • Automated release enabled including addition of CHANGELOG.md [MR19].

0.1.0 - 2024-09-30

Initial test release:

  • New pipeline to preprocess visibilities within MSv2 & MSv4.

  • Enables user-configurable processing functions chains.

  • Supports distributed data processing via dask.

  • Supports on-disk MSv2 –> MSv4 conversion via xradio.

  • Supports in-memory MSv2 <–> MSv4 convertibility.

Progress:

  • Pipeline release & further documentation [MR18].

  • Further improvement to pipeline documentation [MR17].

  • Improve pipeline documentation and improve code structure for .yml configurability [MR14].

  • Enable slurm support [MR13].

  • Improve code structure for Dask distrubution functionality [MR12].

  • Enable auto-detection of MS version [MR11].

  • Enable loading/writing MS [MR10].

  • Processing functions introduced & harmonised with the configurability of the pipeline [MR9]

  • Pipeline logging structure & handling of exceptions introduced + onboarding changes in xradio [MR8].

  • First prototype for Dask distribution deployed [MR7].

  • Repository restructured & .yml configuration/functionality added [MR6].

  • Pipeline created with classes to handle MSv2 & MSv4 in-memory [MR5].

  • Distributed (Dask-based) machinery created. Minimal documentation added [MR2].