Changelog
Development
3.0.0 - 2026-06-09
Added
AOFlagger strategy presets. It is now possible to choose a bundled strategy file via
kind: preset/name: <preset_name>. The available presets are described in the documentation pages. It is still possible to supply a custom strategy file viakind: file/path: <path>.
Changed
Multiple improvements and breaking changes to the configuration syntax. Refer to the documentation.
Pipeline is now launched via the
runsubcommand of the CLI (ska-sdp-batch-preprocess run ...).
Removed
--extra-inputs-dir/-eCLI argument. Relative paths in configuration are now resolved against the current working directory. This option is now effectively superseded by--sdm-pathfor procedurally fetching extra inputs.
2.10.1 - 2026-04-28
Added
Documentation for SDM mode.
Applycalstep can now be run in SDM mode the usingfield_idandcalibration_purposefields instead ofparmdb. This syntax is provisional and will change in the upcoming 3.0 release.
2.10.0 - 2026-04-24
Exclude autocorrelations from flagging reports, partial implementation of Science Data Model (SDM) mode, CLI apps built with typer.
Added
ska-sdp-flagging-report: New-a/--include-autocorrelationsflag to opt in to including autocorrelations in the report statistics.Added
--sdm-pathCLI argument to run the pipeline in SDM mode. When provided, logs, the input config file and QA products are written into the standard location in the SDM directory; the option is mutually exclusive with--extra-inputs-dir. NOTE:ApplycalandDemixercannot be run in SDM mode yet.
Changed
Autocorrelations are now excluded from flagging report statistics by default.
Missing baselines in the per-baseline flagging matrix now render as white instead of 0% flagged.
Both CLI apps (
ska-sdp-batch-preprocessandska-sdp-flagging-report) now use thetyperlibrary instead ofargparse.
2.9.0 - 2026-04-10
Enhanced RFI flagging report with new plots and an improved layout. Standalone flagging report CLI app.
Added
ska-sdp-flagging-report: Standalone CLI app to generate flagging report and associated plotFlagging report plots now include a time-frequency heatmap and a flagged fraction by baseline length plot.
Flagging report plots now use an improved, SKAO-themed layout.
Changed
The flagging report now stores flag sums and sample counts instead of flagged fractions.
2.8.0 - 2026-03-26
Deleted XBPP, refactored configuration system in preparation for adding Science Data Model interface.
Changed
Restricted which DP3 parameters are available for the user to set in the configuration file.
More strict validation of step parameters via pydantic.
Removed
The
msout.overwriteoption has been disabled; it had been made irrelevant by requiring the output directory to contain no measurement sets.Removed xradio batch pre-processing application and all dependent code.
2.7.3 - 2026-03-11
Changed
Made the output directory emptiness check made on startup less stringent; the only requirement is that it should not contain subdirectories with a
.msextension.
2.7.2 - 2026-02-05
Added
Flagging report calculation in MSv2 pipeline is now distributed over dask workers.
Documentation has been largely re-written and now contains quickstart and AWS tutorials.
Fixed
AWS “user” script now takes care of it of setting up MODULEPATH to include the directory containing spack metamodules.
Changed
Both pipeline applications now require the output directory to be empty.
Depend on xradio 1.1.0 and on the final xradio 1.0 data schema. Xradio processing sets created with xradio 0.x may not be compatible with the MSv4 pipeline anymore.
2.7.1 - 2026-01-22
Fixed
Added
astropyto the base dependency group, which was missing in the previous release.
2.7.0 - 2026-01-21
Added initial version of the RFI flagging report and associated plot to the MSv2 pipeline.
Added
MSv2 pipeline now outputs an RFI flagging report and associated plot for each output MS, based on the contents of its FLAG column.
Pipeline logs are now written to
<OUTPUT_DIR>/pipeline.log.It is now possible to change the logging format via the environment variable
SKA_BPP_LOGGING_FORMAT. If unset, the SKAO default format is used.Additional SLURM scripts for AWS in
scripts/
Fixed
MSv2 pipeline now checks the output MS path does not exist. Before, there was an edge case where, if the desired output MS path already existed as a directory (e.g. because of a previous pipeline run), the output MS would be written as a sub-directory of it.
2.6.0 - 2025-10-04
Added frequency distribution, mainly to enable Demixing on large bandwidths. Pipeline configuration checks have been made stricter.
Added
MSv2 pipeline can now perform frequency distribution, by splitting the processing of every input MS in frequency in equally-sized chunks. Added
--frequency-chunk-hzCLI app argument. This was added to make Demixing work on large bandwidths.MSv4 pipeline now distributes the work along both time and frequency chunks. The frequency chunk sizes must respect some constraints that are checked on startup, i.e. be divisible by the frequency averaging factor of the pipeline and
demixfreqstep.MSv2 pipeline now also saves a dask HTML report.
Both MSv2 and MSv4 pipelines now log the configuration upon starting, and also save it to
<OUTPUT_DIR>/config.yamlfor future reference.Additional parameter checks for
Demixersteps
Changed
Data selection parameters in the
msin(aka.Input) step of DP3 are now forbidden, because they are not compatible with the distributed processing model of the MSv4 pipeline. Such data selection would be replaced by using the.selor.iselmethods on most xarray objects.The
Preflaggerparameterschanandtimeslotare now forbidden as well, because data ranges selected using indices would refer to different data ranges once the configuration is passed to a DP3 instance that only processes one chunk of the data. Theabstimeandfreqrangeparameters are safe to use instead.The
Demixerparametersdemixtimeresolutionanddemixfreqresolutionare now forbidden to facilitate the additional parameter validation required by frequency distribution. Please usedemixtimestepanddemixfreqsteponly.The
Averagerparameterstimeresolutionandfreqresolutionare now forbidden for the same reason. Please usetimestepandfreqsteponly.
Fixed
Both MSv2 and MSv4 pipelines now check that the dask workers have the
processresource available and raise an error immediately otherwise. Previously, the workers would simply hang indefinitely.
Removed
It is not necessary anymore to have the DP3 executable installed; instead, we now depend only on the
dp3Python package.
2.5.4 - 2025-07-23
Streamlined dependency specifications.
Changed
Allow numpy 2.x again, stop caring about the SKA SDP spack environment in
pyproject.toml.
2.5.3 - 2025-07-01
Changed
Depend on numpy 1.26, as depending on numpy 2.x created too many problems in the SKA spack environment.
Revert previous change: xradio pipeline will always save the HTML report.
2.5.2 - 2025-07-01
Changed
In
ska-sdp-batch-preprocess-xradio, only save the HTML dask report if thebokehpackage is installed. Whilebokehis a mandatory dependency inpyproject.toml, we will treat it as an optional dependency in the SKA spack environment to avoid conflicts in the required numpy version (the officialbokehspack recipe requires numpy 1.x).
Fixed
Fixed list of mandatory dependencies in
pyproject.toml
2.5.1 - 2025-06-30
Added
DP3 step timings are now parsed and logged for each task. At the end of the run, the overall median step timings are also logged.
Fixed
Fixed an issue where the dask workers would run out of RAM because the dask scheduler attempts to load too much input data in advance. Tasks are now submitted progressively on the client side, such that the number of concurrently scheduled tasks remains below a reasonable maximum (1.5 x num_workers).
2.5.0 - 2025-06-20
Added new batch-preprocessing pipeline app that can process xradio datasets.
Added
Added CLI app
ska-sdp-batch-preprocess-xradio, which ingests an xradio processing set and writes out multiple Measurement Sets, one per input time chunk.
Changed
Now using SKAO standard logging format
Fixed
The
ska-sdp-batch-preprocessapp will now raise an exception in distributed mode if any processing task failed.
2.4.0 - 2025-04-15
Added bright source subtraction. Demixer step is now available to run.
Added
Ability to run Demixer step
2.3.0 - 2025-03-19
CLI interface change in preparation of adding Demixing.
Changed
Renamed CLI argument
--solutions-dirto--extra-inputs-dir
2.2.0 - 2025-02-04
Added option to use dask distribution.
Added
Ability to distribute over multiple dask workers, where one input MS corresponds to one task. CLI app now accepts an optional
--dask-schedulerargument followed by the network address of the scheduler to use.
2.1.1 - 2025-01-31
Improved H5Parm validator.
Added
H5Parm validator is now much stricter and will raise a helpful error message even on unlikely corner cases.
2.1.0 - 2025-01-29
ApplyCal step is now available to run.
Added
Ability to run ApplyCal step
CLI arguments are now grouped into “required” and “optional” groups, which makes the batch pre-processing app help text clearer.
Fixed
The pipeline now checks that there are no duplicate input MeasurementSet names; if there are, it raises an error. This is necessary because two input paths
dir1/data.msanddir2/data.mscorrespond to the same pre-processed output path. Previously, such a situation would have resulted in either a crash or some output measurement sets being overwritten.
2.0.0 - 2025-01-20
Preliminary release following a complete rewrite. The batch pre-processing pipeline now wraps DP3.
Added
Command line app can now be called via the command
ska-sdp-batch-preprocessAbility to run PreFlagger step
Ability to run AOFlagger step
Changed
New command line interface
New configuration file format, where step names and options map (almost) directly to what DP3 expects.
Removed
Support for MSv4
Distribution with dask; will be added back in an upcoming release
1.0.1 - 2024-11-29
Dummy release that was required to comply with the SKAO release process.
1.0.0 - 2024-11-05
Major release with the following additions:
Bug fixed: MSv2 output did not capture all changes conducted by the requested chain of processing functions.
New
classmethodintroduced toMeasurementSetallowing instances to be called directly withlist[Visibility]inputs.Relevant improvements to documentation.
Progress:
Bug fixed, enabling the output of a given processing function in the chain to be correctly passed into the next function [MR20].
Automated release enabled including addition of
CHANGELOG.md[MR19].
0.1.0 - 2024-09-30
Initial test release:
New pipeline to preprocess visibilities within MSv2 & MSv4.
Enables user-configurable processing functions chains.
Supports distributed data processing via
dask.Supports on-disk MSv2 –> MSv4 conversion via
xradio.Supports in-memory MSv2 <–> MSv4 convertibility.
Progress:
Pipeline release & further documentation [MR18].
Further improvement to pipeline documentation [MR17].
Improve pipeline documentation and improve code structure for
.ymlconfigurability [MR14].Enable
slurmsupport [MR13].Improve code structure for Dask distrubution functionality [MR12].
Enable auto-detection of MS version [MR11].
Enable loading/writing MS [MR10].
Processing functions introduced & harmonised with the configurability of the pipeline [MR9]
Pipeline logging structure & handling of exceptions introduced + onboarding changes in
xradio[MR8].First prototype for Dask distribution deployed [MR7].
Repository restructured &
.ymlconfiguration/functionality added [MR6].Pipeline created with classes to handle MSv2 & MSv4 in-memory [MR5].
Distributed (Dask-based) machinery created. Minimal documentation added [MR2].