# Changelog

## Development

## 3.0.0 - 2026-06-09

### Added

- AOFlagger strategy presets. It is now possible to choose a bundled strategy file via ``kind: preset`` / ``name: <preset_name>``.
  The available presets are described in the documentation pages. It is still possible to supply a custom strategy file via ``kind: file`` / ``path: <path>``.

### Changed

- Multiple improvements and breaking changes to the configuration syntax. Refer to the documentation.
- Pipeline is now launched via the `run` subcommand of the CLI (`ska-sdp-batch-preprocess run ...`).

### Removed

- `--extra-inputs-dir` / `-e` CLI argument. Relative paths in configuration are now resolved against the current working directory.
  This option is now effectively superseded by `--sdm-path` for procedurally fetching extra inputs.

## 2.10.1 - 2026-04-28

### Added

- Documentation for SDM mode.
- `Applycal` step can now be run in SDM mode the using `field_id` and
  `calibration_purpose` fields instead of `parmdb`. This syntax is provisional
  and will change in the upcoming 3.0 release.

## 2.10.0 - 2026-04-24

Exclude autocorrelations from flagging reports, partial implementation of Science Data Model (SDM) mode, CLI apps built with `typer`.

### Added

- `ska-sdp-flagging-report`: New `-a` / `--include-autocorrelations` flag to opt in to including autocorrelations in the report statistics.
- Added `--sdm-path` CLI argument to run the pipeline in SDM mode. When provided,
  logs, the input config file and QA products are written into the standard location in
  the SDM directory; the option is mutually exclusive with `--extra-inputs-dir`.
  **NOTE:** `Applycal` and `Demixer` cannot be run in SDM mode yet.

### Changed

- Autocorrelations are now excluded from flagging report statistics by default.
- Missing baselines in the per-baseline flagging matrix now render as white instead of 0% flagged.
- Both CLI apps (`ska-sdp-batch-preprocess` and `ska-sdp-flagging-report`) now use the `typer` library instead of `argparse`.

## 2.9.0 - 2026-04-10

Enhanced RFI flagging report with new plots and an improved layout. Standalone flagging report CLI app.

### Added

- `ska-sdp-flagging-report`: Standalone CLI app to generate flagging report and associated plot
- Flagging report plots now include a time-frequency heatmap and a flagged fraction by baseline length plot.
- Flagging report plots now use an improved, SKAO-themed layout.

### Changed

- The flagging report now stores flag sums and sample counts instead of flagged fractions.

## 2.8.0 - 2026-03-26

Deleted XBPP, refactored configuration system in preparation for adding Science Data Model interface.

### Changed

- Restricted which DP3 parameters are available for the user to set in the configuration file.
- More strict validation of step parameters via pydantic.

### Removed

- The `msout.overwrite` option has been disabled; it had been made irrelevant by requiring the output directory to contain no measurement sets.
- Removed xradio batch pre-processing application and all dependent code.

## 2.7.3 - 2026-03-11

### Changed

- Made the output directory emptiness check made on startup less stringent; the only requirement is that it should not contain subdirectories with a `.ms` extension.

## 2.7.2 - 2026-02-05

### Added

- Flagging report calculation in MSv2 pipeline is now distributed over dask workers.
- Documentation has been largely re-written and now contains quickstart and AWS tutorials.

### Fixed

- AWS "user" script now takes care of it of setting up MODULEPATH to include the directory containing spack metamodules.

### Changed

- Both pipeline applications now require the output directory to be empty.
- Depend on xradio 1.1.0 and on the final xradio 1.0 data schema. Xradio processing sets created with xradio 0.x may not be compatible with the MSv4 pipeline anymore.


## 2.7.1 - 2026-01-22

### Fixed

- Added `astropy` to the base dependency group, which was missing in the previous release.

## 2.7.0 - 2026-01-21

Added initial version of the RFI flagging report and associated plot to the MSv2 pipeline.

### Added

- MSv2 pipeline now outputs an RFI flagging report and associated plot for each output MS, based on the contents of its FLAG column.
- Pipeline logs are now written to `<OUTPUT_DIR>/pipeline.log`.
- It is now possible to change the logging format via the environment variable `SKA_BPP_LOGGING_FORMAT`. If unset, the SKAO default format is used.
- Additional SLURM scripts for AWS in `scripts/`

### Fixed

- MSv2 pipeline now checks the output MS path does not exist. Before, there was an edge case where, if the desired output MS path already existed as a directory (e.g. because of a previous pipeline run), the output MS would be written as a sub-directory of it.

## 2.6.0 - 2025-10-04

Added frequency distribution, mainly to enable Demixing on large bandwidths. Pipeline configuration checks have been made stricter.

### Added

- MSv2 pipeline can now perform frequency distribution, by splitting the processing of every input MS in frequency in equally-sized chunks. Added `--frequency-chunk-hz` CLI app argument. This was added to make Demixing work on large bandwidths.
- MSv4 pipeline now distributes the work along both time and frequency chunks. The frequency chunk sizes must respect some constraints that are checked on startup, i.e. be divisible by the frequency averaging factor of the pipeline and `demixfreqstep`.
- MSv2 pipeline now also saves a dask HTML report.
- Both MSv2 and MSv4 pipelines now log the configuration upon starting, and also save it to `<OUTPUT_DIR>/config.yaml` for future reference.
- Additional parameter checks for `Demixer` steps

### Changed

- Data selection parameters in the `msin` (aka. `Input`) step of DP3 are now forbidden, because they are not compatible with the distributed processing model of the MSv4 pipeline. Such data selection would be replaced by using the `.sel` or `.isel` methods on most xarray objects.
- The `Preflagger` parameters `chan` and `timeslot` are now forbidden as well, because data ranges selected using indices would refer to different data ranges once the configuration is passed to a DP3 instance that only processes one chunk of the data. The `abstime` and `freqrange` parameters are safe to use instead.
- The `Demixer` parameters `demixtimeresolution` and `demixfreqresolution` are now forbidden to facilitate the additional parameter validation required by frequency distribution. Please use `demixtimestep` and `demixfreqstep` only.
- The `Averager` parameters `timeresolution` and `freqresolution` are now forbidden for the same reason. Please use `timestep` and `freqstep` only.

### Fixed

- Both MSv2 and MSv4 pipelines now check that the dask workers have the `process` resource available and raise an error immediately otherwise. Previously, the workers would simply hang indefinitely.

### Removed

- It is not necessary anymore to have the DP3 executable installed; instead, we now depend only on the `dp3` Python package.

## 2.5.4 - 2025-07-23

Streamlined dependency specifications.

### Changed

- Allow numpy 2.x again, stop caring about the SKA SDP spack environment in
  `pyproject.toml`.

## 2.5.3 - 2025-07-01

### Changed

- Depend on numpy 1.26, as depending on numpy 2.x created too many problems in
  the SKA spack environment.
- Revert previous change: xradio pipeline will always save the HTML report.

## 2.5.2 - 2025-07-01

### Changed

- In `ska-sdp-batch-preprocess-xradio`, only save the HTML dask report if
  the `bokeh` package is installed. While `bokeh` is a mandatory dependency
  in `pyproject.toml`, we will treat it as an optional dependency in the SKA
  spack environment to avoid conflicts in the required numpy version
  (the official `bokeh` spack recipe requires numpy 1.x).

### Fixed

- Fixed list of mandatory dependencies in `pyproject.toml`

## 2.5.1 - 2025-06-30

### Added

- DP3 step timings are now parsed and logged for each task. At the end of the run,
  the overall *median* step timings are also logged.

### Fixed

- Fixed an issue where the dask workers would run out of RAM because the dask
  scheduler attempts to load too much input data in advance. Tasks are now
  submitted progressively on the client side, such that the number of concurrently
  scheduled tasks remains below a reasonable maximum (1.5 x num_workers).


## 2.5.0 - 2025-06-20

Added new batch-preprocessing pipeline app that can process xradio datasets.

### Added

- Added CLI app `ska-sdp-batch-preprocess-xradio`, which ingests an xradio processing set
  and writes out multiple Measurement Sets, one per input time chunk.

### Changed

- Now using SKAO standard logging format

### Fixed

- The `ska-sdp-batch-preprocess` app will now raise an exception in distributed mode if
  any processing task failed.

## 2.4.0 - 2025-04-15

Added bright source subtraction. Demixer step is now available to run.

### Added

- Ability to run Demixer step


## 2.3.0 - 2025-03-19

CLI interface change in preparation of adding Demixing.

### Changed

- Renamed CLI argument `--solutions-dir` to `--extra-inputs-dir`


## 2.2.0 - 2025-02-04

Added option to use dask distribution.

### Added

- Ability to distribute over multiple dask workers, where one input MS corresponds to one task.
  CLI app now accepts an optional `--dask-scheduler` argument followed by the network address of
  the scheduler to use.


## 2.1.1 - 2025-01-31

Improved H5Parm validator.

### Added

- H5Parm validator is now much stricter and will raise a helpful error message even on unlikely corner cases.


## 2.1.0 - 2025-01-29

ApplyCal step is now available to run.

### Added

- Ability to run ApplyCal step
- CLI arguments are now grouped into "required" and "optional" groups, which makes the
  batch pre-processing app help text clearer.

### Fixed

- The pipeline now checks that there are no duplicate input MeasurementSet names; if there are,
  it raises an error. This is necessary because two input paths `dir1/data.ms` and `dir2/data.ms`
  correspond to the same pre-processed output path. Previously, such a situation would have
  resulted in either a crash or some output measurement sets being overwritten.


## 2.0.0 - 2025-01-20

Preliminary release following a complete rewrite. The batch pre-processing pipeline now wraps DP3.

### Added

- Command line app can now be called via the command `ska-sdp-batch-preprocess`
- Ability to run PreFlagger step
- Ability to run AOFlagger step

### Changed

- New command line interface
- New configuration file format, where step names and options map (almost) directly to what DP3 expects.

### Removed

- Support for MSv4
- Distribution with dask; will be added back in an upcoming release


## 1.0.1 - 2024-11-29

Dummy release that was required to comply with the SKAO release process.


## 1.0.0 - 2024-11-05

Major release with the following additions:

* Bug fixed: MSv2 output did not capture all changes conducted by the requested chain of processing functions.
* New `classmethod` introduced to `MeasurementSet` allowing instances to be called directly with `list[Visibility]` inputs.
* Relevant improvements to documentation.

Progress:

* Bug fixed, enabling the output of a given processing function in the chain to be correctly passed into the next function [MR20].
* Automated release enabled including addition of `CHANGELOG.md` [MR19].


## 0.1.0 - 2024-09-30

Initial test release:

* New pipeline to preprocess visibilities within MSv2 & MSv4.
* Enables user-configurable processing functions chains.
* Supports distributed data processing via `dask`.
* Supports on-disk MSv2 --> MSv4 conversion via `xradio`.
* Supports in-memory MSv2 <--> MSv4 convertibility. 


Progress:

* Pipeline release & further documentation [MR18].
* Further improvement to pipeline documentation [MR17].
* Improve pipeline documentation and improve code structure for `.yml` configurability [MR14].
* Enable `slurm` support [MR13].
* Improve code structure for Dask distrubution functionality [MR12].
* Enable auto-detection of MS version [MR11].
* Enable loading/writing MS [MR10].
* Processing functions introduced & harmonised with the configurability of the pipeline [MR9]
* Pipeline logging structure & handling of exceptions introduced + onboarding changes in `xradio` [MR8].
* First prototype for Dask distribution deployed [MR7].
* Repository restructured & `.yml` configuration/functionality added [MR6].
* Pipeline created with classes to handle MSv2 & MSv4 in-memory [MR5].
* Distributed (Dask-based) machinery created. Minimal documentation added [MR2].