.. _calibration: Station Visibility Calibration ============================== The primary function to take observed visibilities between the log-periodic antennas of a station, create a sky model and return antenna-based calibration solutions is :py:func:`~ska_low_mccs_calibration.calibration.calibrate_mccs_visibility`. This function, :ref:`described below `, is designed to communicate with a parent MCCS pipeline or script via SDP `Visibility`_ and `GainTable`_ datasets -- :class:`xarray.Dataset` objects comprising the primary data and standard metadata such as coordinates, weights, flags, etc. The initial use-case for this function is to provide a set of calibration solutions for each station beamformer under a range of observational parameters, such as the observing frequency and the ambient temperature of station components. This means that while the function must be efficient, to cover the full bandwidth in a reasonable time frame, it does not need to run in real-time. It also means that some pre-processing is possible; and some initial interference flagging is expected. One could also imagine choosing a favorable interference environment, ionosphere, solar and sidereal times, etc., to give the best solutions. While observing demands may not allow for this, the module needs to produce a range of heuristics and quality assessment metrics to help determine whether or not the new solutions should be added to the calibration database. These are discussed further on the :ref:`QA page `. Function :py:func:`~ska_low_mccs_calibration.calibration.calibrate_mccs_visibility` produces standard antenna-based gain solutions and is expected to run for a single coarse frequency channel at a time. When solving for a physical effect requires averaging or interpolating in frequency, such as antenna delay calibration, this will need to run in post-processing. Setup ----- Setting up for calibration has two parts. For each observation, the relevant data and metadata need to be added to a `Visibility`_ dataset, and for all observations the Embedded Element Pattern (EEP) images for the log-periodic antennas need to be converted to NumPy ``.npy`` format for efficient use during calibration. EEP conversion is a one-off step that is discussed in the :ref:`EEP section `. Data and metadata for an observation are obtained from two files; an HDF5 file with specific observation data and a general YAML configuration file for the station. Observation data can be read using the `HDF5 for Python `_ module: .. code-block:: python import h5py import numpy as np date_time = "20240119_12601" channel_id = 140 # Get observation data datafile = h5py.File(f"/path/to/data/correlation_burst_{channel_id}_{date_time}_0.hdf5", 'r') correlation_data = np.squeeze(datafile['correlation_matrix']['data']) correlation_times = np.squeeze(datafile['sample_timestamps']['data']) # Get observation metadata correlation_metadata = dict(datafile['root'].attrs) ntimes = correlation_metadata['n_blocks'] int_time = correlation_metadata['tsamp'] Relevant station configuration information can be extracted using the :py:func:`~ska_low_mccs_calibration.utils.read_yaml_config` function: .. code-block:: python from ska_low_mccs_calibration.utils import read_yaml_config config = read_yaml_config("aavs3.yaml") This function returns a named-tuple containing the following station metadata: * :code:`config.location`: :class:`astropy.coordinates.EarthLocation` station location, * :code:`config.antenna_masks`: a 1D boolean :class:`ndarrays ` of bad-antenna flags, * :code:`config.baselines`: a tuple containing 1D :class:`ndarrays ` for the first and second antenna index of each baseline (zero-based indices relative to the antenna order in the YAML file), * :code:`config.enu`: a tuple containing 1D :class:`ndarrays ` for the local east, north and up offset of each antenna, * :code:`config.polid`: a 4-element list containing the position of XX, XY, YX and YY data products along the polarisation axis of a dataset, * :code:`config.rotation`: azimuthal rotation of station (bearing clockwise from north) in degrees, * :code:`config.station_name`: name of the station, * :code:`config.antenna_names`: antenna names. A final choice is which time samples to use. The calibration solvers can handle multiple time steps, but the sky model generation is currently only set up for a single time. So the choice is between using a single time or using the time average. In the example below the time average is used. These various pieces of information can now be packaged in a `Visibility`_ dataset. .. note:: It appears that visibilities in the HDF5 file are ordered as expected for the upper triangular part of the correlation matrix, but they come from the lower triangular part. Not sure why, but none the less :py:func:`~ska_low_mccs_calibration.utils.read_station_config` geneartes indices for the upper triangular part then swaps the order. This is equivalent to a conjugate transpose of each 2x2 coherency matrices. .. note:: The EEPs are also complex conjugated when forming the beam patterns. .. code-block:: python # First, expand antenna flags to baseline flags masked_antennas = np.where(config.antenna_masks==True)[0] baselines = config.baselines bl_flags = np.logical_or( np.isin(baselines[0], masked_antennas), np.isin(baselines[1], masked_antennas), ) # May as well flag autos as well bl_flags = np.logical_or(bl_flags, baselines[0] == baselines[1]) # Stack flags for the four polarisations in the dataset data_flags = np.tile(bl_flags, (4, 1)).T # At this point could add existing RFI flags and the like # Convert antenna positions to numpy array for each of use and transpose enu = np.array(config.enu, dtype=model_dtype(0).real.dtype).T channel_bw_MHz = 400.0 / 512.0 frequency_MHz = channel_id * channel_bw_MHz vis_ave = sum(correlation_data) / float(ntimes) int_time *= ntimes itime = ntimes // 2 time = Time(correlation_times[itime], format='unix', location=location) time.format = 'fits' from ska_low_mccs_calibration.utils import sdp_visibility_datamodel vis = sdp_visibility_datamodel( vis=vis_ave[:, config.polid], flags=data_flags[:, config.polid], uvw=enu[baselines[0]] - enu[baselines[1]], ant1=baselines[0], ant2=baselines[1], location=config.location, antpos_enu=enu, time=time, int_time=int_time, frequency_mhz=frequency_MHz, station_name=config.station_name, antenna_names=config.antenna_names, ) Note: :py:func:`~ska_low_mccs_calibration.utils.sdp_visibility_datamodel` has been updated to accept multi-time data. This can be called as follows: .. code-block:: python vis = sdp_visibility_datamodel( vis=np.array(correlation_data)[:, :, config.polid], flags=np.tile(data_flags, (ntimes, 1, 1))[:, :, config.polid], uvw=np.tile(enu[baselines[0]] - enu[baselines[1]], (ntimes, 1, 1)), ant1=baselines[0], ant2=baselines[1], location=config.location, antpos_enu=enu, time=Time_array, int_time=int_time, frequency_mhz=frequency_MHz, station_name=config.station_name, antenna_names=config.antenna_names, ) However :py:func:`~ska_low_mccs_calibration.calibration.calibrate_mccs_visibility` (see below) expects a single time. .. _calibrate_mccs_visibility: Calling the Calibration Function -------------------------------- Everything up to this point, from the contents of MCCS observation files to specifics about RFI flagging, has the potential to evolve through the construction and commissioning phase, so has been kept separate from the main calibration function. From this point on, however, data products should be stable and it is expected that :py:func:`~ska_low_mccs_calibration.calibration.calibrate_mccs_visibility` will handle everything required to generate gain solutions. .. note:: Need to add an optional input `GainTable`_ containing initial gain estaimtes. These could be extracted, for example, from the calibration database for the current frequency and conditions. Also, will look into incorporating the masked_antennas parameter into the `GainTable`_ data model. .. code-block:: python from ska_low_mccs_calibration.calibration import calibrate_mccs_visibility gain_table, model_vis, calibrated_vis, masked_antennas, info = calibrate_mccs_visibility( vis, masked_antennas=masked_antennas, eep_path="/path/to/eeps/npy", eep_filebase="FEKO_AAVS3_vogel_256_elem_50ohm_", eep_rotation_deg=config.rotation, ) This function outputs the final `GainTable`_, the model dataset used in calibration, the calibrated dataset and an updated list of flagged antennas. It also outputs a NameTuple with keys for various QA metrics. For example: .. code-block:: python print("correlation coeffs [XX,XY,YX,YY]:", [float(f"{x:.3f}") for x in info.corrcoeff]) print("residual vis std. dev [XX,XY,YX,YY]:", [float(f"{x:4.1f}") for x in info.residual_std], "K") print("residual vis maximum [XX,XY,YX,YY]:", [float(f"{x:4.1f}") for x in info.residual_max], "K") print(f"number of ant flags initial: {info.n_masked_initial}, final: {info.n_masked_final}") print(f"xy-phase update: {info.xy_phase:.2f} deg") print(f"solar adjustment factor: {info.sun_adjustment_factor:.3f}") print(f"solar elevation: {info.sun_elevation:.2f} deg") print(f"galactic centre elevation: {info.galactic_centre_elevation:.2f} deg") :: correlation coeffs [XX,XY,YX,YY]: [0.99, 0.901, 0.898, 0.989] residual vis std. dev [XX,XY,YX,YY]: [9.8, 7.0, 7.1, 10.4] K residual vis maximum [XX,XY,YX,YY]: [82.9, 60.5, 86.7, 79.5] K number of ant flags initial: 18, final: 22 xy-phase update: -176.62 deg solar adjustment factor: 1.622 solar elevation: 77.38 deg galactic centre elevation: 70.53 deg Additional parameters that are expected to be of most interest are: * :code:`back_rotation`: If this option is true, the polarisation frame of the final gain solutions will be rotated about zenith to local cardinal directions with X=East and Y=North. Default is false. * :code:`qa_plots`: An optional dictionary can be supplied to specify plots for automatic generation. Any plotting functions from :py:func:`~ska_low_mccs_calibration.qa` can be listed with a True or False value, and any set to true will be called with PNG output (named "function.png" without the `generate_` prefix). Use print(ska_low_mccs_calibration.qa.__all__) for a list. For example: .. code-block:: python qa_plots = { "generate_gaintable_plots": True, "generate_visibility_comparison_plots": False, "generate_closure_comparison_plots": True, } * :code:`nside`: The `HEALPix `_ *NSIDE* parameter used when building the `PyGDSM`_ galactic sky model. The default value is 32, which corresponds to approximately 110 arcminutes. This is adequate at 100 MHz but may need to be increased for higher frequencies in the SKA-Low band. * :code:`skymodel`: This parameter can be used either to specify which sky model to build, or to import a user-generated model `Visibility`_ dataset, bypassing internal sky model generation. Supported sky models are "GSM" -- the Global Sky Model -- and "Sun", a solar point-source model. When using the latter it is recommended that the user also specify a :code:`min_uv` cutoff (see figure in the :ref:`LSM section `). * :code:`min_uv`: This parameter sets a minimum baseline length to use in calibration (in metres). The default value is zero. * :code:`ignore_eeps`: If this option is true, beam models will be ignored when predicting model visibilities. Sky model components will be added to the visibilities with equal amplitude in XX and YY and zeros amplitude in XY and YX. This option is intended for fast phase fits on a narrow-field calibrator such as the sun. Default is false. * :code:`adjust_solar_model`: The "GSM" sky model is formed from the addition of the `PyGDSM`_ galactic sky model and the point-source solar model. The brightness of the sun can be variable, so after an initial round of calibration and flagging, the function can fit for the relative brightness of the galactic and solar components and adjust the solar model. This is the default course of action. * :code:`jones_solve`: After the initial round of calibration, flagging and sky model adjustment, a more rigorous polarisation calibration can be performed. When done from scratch this can more than double the runtime of :py:func:`~ska_low_mccs_calibration.calibration.calibrate_mccs_visibility`, however when starting from existing polarised solutions convergence should be much faster. By default, polarisation calibration is not performed, and instead xy-phase is estimated by comparing the cross-polarised phase of the calibrated data with that of the model. * :code:`refant`: Antenna to use for phase referencing. Should be one of the zero-based indices used in the visibility dataset. Default is 1. Both polarisations are phase referenced against the X polarisation of refant, as a non-zero XY-phase is expected and solved for. Runtime for a single channel and default parameters is 20--30 seconds and is dominated by the generation of model visibilities. The steps involved in :py:func:`~ska_low_mccs_calibration.calibration.calibrate_mccs_visibility` are as follows. Back Rotation ------------- The station being calibrated will in general have an azimuthal orientation that is different from other stations. As both the observed and model visibilities have polarisation axes in the rotated frame, the calibration solutions generated will be as well. The :code:`back_rotation` option enables a rotation of the gain solution matrices to a standard frame. When the solution matrices are inverted and applied to the observed data, the resulting calibrated visibilities will be back-rotated to the standard frame. In situations where user wants to generate solutions in the physical frame of the station and apply the back rotation later, for example after a delay fit, the :code:`back_rotation` option can be left in the false state and function :py:func:`~ska_low_mccs_calibration.calibration.rotate_gaintable` can be used to rotate the gaintables. Initial Calibration ------------------- On start up, :py:func:`~ska_low_mccs_calibration.calibration.calibrate_mccs_visibility` builds a sky model, imports EEP images and generates model visibilities. This is discussed in more detail in the :ref:`sky model section `. Using these model visibilities, an initial round of calibration is performed. This can improve the convergence of subsequent calibration steps by removing large amplitude differences, and helps to identify erroneous antennas. In its current form the solver flags any antennas with a gain amplitude below 80% of the median value. This threshold can likely be dropped by an order of magnitude or more, and the outlier detection could be expanded to include interference spikes as well as low-gain antennas. This calibration step is independent for the X and Y polarisation channels and uses `SDP Func-Python`_ functions `solve_gaintable`_ and `apply_gaintable`_ (however a temporary local :py:func:`~ska_low_mccs_calibration.utils.apply_gaintable` function is used while a `SDP Func-Python`_ bug is being addressed). While these solutions do not include polarisation leakage terms, the phase difference between the X and Y channels is estimated by comparing the cross-polarised phase of the calibrated data with that of the model. This is calculated in function :py:func:`~ska_low_mccs_calibration.calibration.xy_phase_estimate`. Sky Model Updates ----------------- The initial round of calibration also allows for a quick check of the sky model. As mentioned above, the sky model is formed from the addition of the `PyGDSM`_ galactic sky model and a point-source model for the sun. The brightness of the sun can vary, so at this point it can help to compare the two sky model components against the calibrated data and adjust the solar brightness level. This adjustment is performed by function :py:func:`~ska_low_mccs_calibration.calibration.adjust_gsm` and is enabled by setting :py:func:`~ska_low_mccs_calibration.calibration.calibrate_mccs_visibility` option :code:`adjust_solar_model=True`. Note that solar variability can exhibit significant polarisation, and this is not captured in the simple rescaling done here. To limit the effect of this kind of variability on the gain solutions, the sun may need to be dealt with using a direction-dependent approach such as peeling. Separate solar calibration has not yet been added to this module, but some indication of the level of variability will be included in the :ref:`QA metrics `. Polarisation Calibration ------------------------ A final round of polarisation calibration can be enabled by setting :code:`jones_solve=True`. This runs `SDP Func-Python`_ function `solve_gaintable`_ with :code:`solver="normal_equations"`. While the sky model is unpolarised, it is spread over a wide field of view with variable instrument polarisation. And the EEPs are further polarised by mutual coupling. Both of these effects are part of the instrument model, so the model visibilities can have a significant level of linear polarisation to calibrate against. The `solve_gaintable`_ function has a few solver options that can use this information, with :code:`solver="normal_equations"` showing the best properties on simulated data. It can take an additional 30--60 seconds to run, however once existing solutions are available this convergence time should drop by an order of magnitude. The image below shows a histogram of phase residuals for each of the four polarisation products and each of the calibration steps. In all cases the calibration results in a single peek centred close to zero, as required, with the broader distribution for the cross-polarised terms coming for the most part from the lower signal-to-noise ratio. The polarised solver results in tighter cross-polarised residuals, as expected, however the level to which this improves the state of the final station beam remains to be seen. .. image:: img/histos.png .. note:: Runtime numbers presented are all for the initial single-threaded version of this code. Multi-threading is expected to result in a significant speed up.