How to run an end-to-end experiment
===================================

This tutorial takes you through the steps required to run an end-to-end experiment using the
``end_to_end_simulaiton.ipynb`` in the notebooks directory of the repo `here
<https://gitlab.com/ska-telescope/sdp/ska-sdp-resource-model/-/tree/main/notebooks/experiments>`_.

In addition to optimising the hardware configuration and running a Monte Carlo simulation (see
:doc:`Running an experiment <../tutorials/run_experiment>`), this also generates the input
configuration files based on scientific parameters and benchmarking data, using the `SDP Parametric
Model <https://gitlab.com/ska-telescope/sdp/ska-sdp-par-model>`_ to estimate the node hours required
by each pipeline and the data storage required for each pipeline and scheduling block instance.

Ensure you have followed the :doc:`installation instructions <../installation>` and successfully run
the simulation with default settings before continuing.

Read the following sections to ensure you are familiar with the simulation's inputs:

- :doc:`Hardware configuration file <../usage/inputs/configuration/hardware_configuration>`
- :doc:`Scheduling block types configuration file
  <../usage/inputs/configuration/scheduling_block_types_configuration>`
- :doc:`Pipeline configuration file <../usage/inputs/configuration/pipeline_configuration>`
- :doc:`Observing schedule file <../usage/inputs/configuration/observing_schedule>`

Also read the following tutorials to understand how to run the simulation and how to optimise the
hardware configuration:

- :doc:`Running the simulation <../usage/run_simulation>`
- :doc:`Running a hardware optimisation <../tutorials/optimise_hardware_configuration>`
- :doc:`Running a Monte Carlo simulation <../tutorials/run_monte_carlo_simulation>`

Requirements for running an end-to-end experiment
-------------------------------------------------

To run an experiment you will need the following inputs:

1. **The scientific parameters of the observations you want to simulate (.xls).**

   See `SDP Pipeline resource requirements estimation
   <https://confluence.skatelescope.org/display/SE/SDP+Pipeline+resource+requirements+estimation>`_
   (link to Confluence) for further details.

   You will need the path to this file to run the experiment.

2. **Your hardware budget (euros).**

   The optimisation process will optimise the split of your budget between compute nodes, capacity
   storage and performance storage.

   .. note::

       The cost per compute node, capacity storage and performance storage are currently hardcoded.

3. **The parallelism percentage you want to use for each pipeline.**

   These values can be changed in the notebook.

4. **The target total integration time for the observing schedule**

   This can be changed in the notebook and will be used to generate an observing schedule by cycling
   through scheduling block types.

5. **The number of optimisation trials you want to run.**

   This is a key parameter in the optimisation process that determines how many times parameters are
   sampled and a simulation is run.

6. **The number of Monte Carlo iterations you want to run.**

   Set this to 0 to skip the Monte Carlo runs.

7. **Whether or not to shuffle the order of observations in the observing schedule.**

   This is an optional part of the Monte Carlo simulation. If set it will shuffle the order of
   observations at the start of each Monte Carlo iteration.

8. **How many groups to group pipelines during the optimisation step.**

   This can be changed in the notebook. It is used to group pipelines together to reduce the number
   of parameters to optimise. If the number provided is greater than the number of pipelines, then
   the number of pipelines will be used instead.

9. **The storage name for the database where the optimisation results will be stored**

   If this database already exists then the results will be appended to it. If it does not exist, a
   new database will be created. The notebook ensures a new database is created each time it runs.

   .. note::

       The database will be created in the current working directory.

   .. note::

       The historical values of the optimisation results will be used by the optimisation algorithm
       to guide the sampling of parameter values. If you don't want this behaviour, you can delete
       the database and a new one will be created.

Running the experiment
----------------------

Once you have the required inputs above and edited the notebook as required, you can run the
experiment by running the notebook.