How to run a Monte Carlo simulation
===================================

This tutorial takes you through the steps required to run a simulation with Monte Carlo iterations
using the SDP resource model.

Ensure you have followed the :doc:`installation instructions <../installation>` and successfully run
the simulation with default settings before continuing.

Read the following sections to ensure you are familiar with the simulation's inputs:

- :doc:`Hardware configuration file <../usage/inputs/configuration/hardware_configuration>`
- :doc:`Scheduling block types configuration file
  <../usage/inputs/configuration/scheduling_block_types_configuration>`
- :doc:`Pipeline configuration file <../usage/inputs/configuration/pipeline_configuration>`
- :doc:`Observing schedule file <../usage/inputs/configuration/observing_schedule>`

.. note::

    An observing schedule can optionally be generated by sampling scheduling block types randomly or
    cycling through them. This is useful for testing purposes or if you want to simulate a schedule
    with a specific duration.

And the following page to understand how to customise the inputs and run the simulation:

- :doc:`Running the simulation <../usage/run_simulation>`

Requirements for the Monte Carlo simulation
-------------------------------------------

To run the Monte Carlo simulation you will need the following inputs:

1. **An observing schedule you want to simulate.**

   There are two options:

   - Provide the path to a JSON file containing the observing schedule you want to simulate (see
     :doc:`observing schedule </usage/inputs/configuration/observing_schedule>`).
   - Decide the duration in hours of the observing schedule you want to simulate and use the
     ``--generate_observing_schedule_hrs`` option to generate an observing schedule by sampling
     scheduling block types (see below).

2. **Configuration of scheduling block types that make up your observing schedule (JSON).**

   See :doc:`Scheduling block types configuration file
   <../usage/inputs/configuration/scheduling_block_types_configuration>`.

   Any scheduling block type that is in the observing schedule file must be in this JSON file with
   all key value pairs.

   You will need the path to this file to run the simulation.

3. **Configuration for each pipeline (JSON).**

   See :doc:`Pipeline configuration file <../usage/inputs/configuration/pipeline_configuration>` for
   details on the pipeline configuration.

   Any pipeline that is in the scheduling block types configuration file must be in this JSON file
   with all key value pairs.

   You will need the path to this file to run the optimisation.

   .. note::

       You can run a hardware optimisation to find the best ``num_nodes`` parameters to use for each
       pipeline (see :doc:`Running a hardware optimisation
       <../tutorials/optimise_hardware_configuration>`).

   .. note::

       The ``pct_parallelism`` parameter will be sampled from a uniform distribution between
       ``pct_parallelism_min`` and ``pct_parallelism_max`` and the ``node_hours`` parameter will be
       sampled from a zero-truncated normal distribution with a mean of ``node_hours_mean`` and a
       standard deviation of ``node_hours_mean * node_hours_uncertainty``.

   .. note::

       To avoid sampling parameters that are not needed, the pipeline configuration file should only
       contain the pipelines that are used in the scheduling block types configuration file.

4. **The hardware configuration you want to simulate (JSON).**

   See :doc:`Hardware configuration file <../usage/inputs/configuration/hardware_configuration>`.

   You will need the path to this file and the key to your chosen hardware configuration to run the
   simulation.

   .. note::

       You can run a hardware optimisation to find the best configuration for a given budget (see
       :doc:`Running a hardware optimisation <../tutorials/optimise_hardware_configuration>`).

5. **The number of Monte Carlo iterations you want to run.**

   You must set this to a value greater than 0 (the default is 0, which skips the Monte Carlo runs).

6. **Whether or not to shuffle the order of observations in the observing schedule.**

   This is an optional part of the Monte Carlo simulation. If set it will shuffle the order of
   observations at the start of each Monte Carlo iteration.

   Default is False.

Running the Monte Carlo simulation
----------------------------------

Once you have the required inputs above, you can run the Monte Carlo simulation.

This can be run programmatically as part of a Python script or Jupyter notebook or on the CLI. You
can also run it from the dashboard interface.

Using the CLI
~~~~~~~~~~~~~

To use the CLI, run the Monte Carlo simulation locally using Poetry:

.. code-block:: bash

    poetry run resource_usage \
    --observing_schedule_path data/schedules/observing_schedule.json \
    --scheduling_block_types_config_path data/config/scheduling_block_types.json \
    --hardware_config_path data/config/hardware.json \
    --hardware_config "control" \
    --pipelines_config_path data/config/pipelines.json \
    --num_monte_carlo_iterations 10000 \
    --num_workers 16 \ # number of worker processes you want to use for running parallel simulations
    --shuffle_observations \ # Optional if you want to shuffle an observing schedule
    --generate_observing_schedule_hrs 24 \ # Optional if you want to generate an observing schedule

Get more details on each of the CLI arguments by running:

.. code-block:: bash

    poetry run resource_usage --help

The simulation will output the mean run time with 95% confidence intervals to logs and the console.

Using the API
~~~~~~~~~~~~~

The following code snippet demonstrates how to run the optimisation in a Python script. The output
returned as a dictionary containing two pandas DataFrames for resource usage over time and
simulation events. Summary statistics are also printed to a log file. If you want to print debug
information to the log file, you can set the debug flag to True.

.. code-block:: python

    from ska_sdp_resource_model.simulate.main import run_simulation
    from ska_sdp_resource_model.simulate.logger import setup_logger

    logger = setup_logger(debug=False)  # Set to True to print debug information to log file

    output = run_simulation(
        hardware_config="control",
        observing_schedule_path="data/schedules/observing_schedule.json",  # Ignored if generate_observing_schedule_hrs > 0
        scheduling_block_types_config_path="data/config/scheduling_block_types.json",
        hardware_config_path="data/config/hardware.json",
        pipelines_config_path="data/config/pipelines.json",
        num_monte_carlo_iterations=100,
        shuffle=True,  # Optional if you want to shuffle an observing schedule
        generate_observing_schedule_hrs=400,  # Optional if you want to generate an observing schedule
    )

Using a Jupyter notebook
~~~~~~~~~~~~~~~~~~~~~~~~

We have included example Jupyter notebooks in the `notebooks directory
<https://gitlab.com/ska-telescope/sdp/ska-sdp-resource-model/-/tree/main/notebooks>`_ of the
repository.

- ``03-optimise-hardware-plus-monte-carlo.ipynb`` demonstrates how to run a Monte Carlo simulation
  (with hardware optimisation) and visualise the results.

Using the web interface
~~~~~~~~~~~~~~~~~~~~~~~

To use the web interface, run the application locally using Poetry:

.. code-block:: bash

    poetry run resource_model

The app will run on localhost at port 8050 by default: http://127.0.0.1:8050/. At the top of the
dashboard is a box to input the number of Monte Carlo iterations to run. Click the run button to
start the simulation. Results of each trial will be displayed in the plots and the mean run time
with 95% confidence intervals will be printed.

Use ``Ctrl + C`` in the terminal to stop the server.