How to optimise a hardware configuration
========================================

This tutorial takes you through the steps required to optimise a hardware configuration using the
SDP resource model. An overview of this functionality is provided in the flowchart below:

.. image:: ../images/optimisation-flow-chart.jpg
    :alt: Optimise hardware configuration flowchart
    :width: 600
    :align: center
    :class: margin-bottom

|

The optimisation process will determine the best hardware configuration for a specific hardware
budget given an observing schedule and its component scheduling block types and pipelines. This
includes:

- Total number of compute nodes
- Total capacity storage in PB
- Total performance storage in TB
- Number of compute nodes allocated to each pipeline "group"

The optimisation step will split the provided hardware budget between compute nodes, capacity
storage and performance storage to find the optimum hardware configuration. The cost of each of
these components is currently hardcoded in the simulation.

Pipelines are grouped automatically by the simulation (using K-means clustering) based on their
``node_hours_mean``, ``pct_parallelism_min`` and ``pct_parallelism_max`` parameters. The number of
groups will be set to the number of pipelines or ``num_pipeline_groups`` (default 6), whichever is
smaller. This allows a better search of the parameter space.

The optuna library is used to perform the optimisation with the default (TPESampler) optimisation
algorithm. See the `optuna documentation
<https://optuna.readthedocs.io/en/stable/tutorial/10_key_features/003_efficient_optimization_algorithms.html>`_
for more information.

Read the following sections to ensure you are familiar with the simulation's inputs:

- :doc:`Hardware configuration file <../usage/inputs/configuration/hardware_configuration>`
- :doc:`Scheduling block types configuration file
  <../usage/inputs/configuration/scheduling_block_types_configuration>`
- :doc:`Pipeline configuration file <../usage/inputs/configuration/pipeline_configuration>`
- :doc:`Observing schedule file <../usage/inputs/configuration/observing_schedule>`

.. note::

    A hardware configuration file is not required for the optimisation.

Ensure you have followed the :doc:`installation instructions <../installation>` and successfully run
the simulation before continuing.

Requirements for the optimisation
---------------------------------

To run the hardware optimisation, you will need the following inputs:

1. **An observing schedule you want to simulate (JSON).**

   Its contents should look something like this (read more about this in the :doc:`Defining an
   observing schedule file <../usage/inputs/configuration/observing_schedule>` section):

   .. code-block:: none

       {
           "scheduling_blocks": [
               "Calibrated Visibilities",
               "Continuum Shallow",
               "Beam Monitoring",
               "Continuum Deep",
               "Continuum Shallow",
               "Continuum Deep",
               "Maintenance",
               "Spectral Deep"
           ]
       }

   You will need the path to this file to run the optimisation.

2. **Configuration of scheduling block types that make up your observing schedule (JSON).**

   This should look something like this (read more about this in the :doc:`Scheduling block types
   configuration file <../usage/inputs/configuration/scheduling_block_types_configuration>`
   section):

       .. code-block:: python

           ...
           {
               "my_scheduling_block_type": {
                   "description": "My scheduling block type",
                   "short_name": "MST",
                   "scheduling_block_instance_time_hrs": 1.0,
                   "integration_time_hrs": 10.0,
                   "pipeline_steps": ["pipeline_A", "pipeline_B"],
                   "raw_vis_gb": 10.0,
                   "processed_vis_gb": 20.0,
                   "data_retention_hrs": 24.0,
               }
           }
           ...

   Any scheduling block type that is in the observing schedule file must be in this JSON file with
   all key value pairs.

   You will need the path to this file to run the optimisation.

3. **Configuration for each pipeline (JSON).**

   This should look something like this for the scheduling block example above (read more about this
   in the :doc:`Pipeline configuration file <../usage/inputs/configuration/pipeline_configuration>`
   section):

       .. code-block:: python

           ...
           {
               "name": "pipeline_A",
               "steps": [
                   {
                       "description": "A data processing pipeline",
                       "node_hours": 1000.0,
                       "pct_parallelism": 80.0,
                       "data_product_storage_gb": 1000.0,
                       "num_nodes": 25,
                   }
               ],
           },
           {
               "name": "pipeline_B",
               "steps": [
                   {
                       "description": "Another data processing pipeline",
                       "node_hours": 500.0,
                       "pct_parallelism": 50.0,
                       "data_product_storage_gb": 3000.0,
                       "num_nodes": 50,
                   }
               ],
           }
           ...

   Any pipeline that is in the scheduling block types configuration file must be in this JSON file
   with all key value pairs.

   You will need the path to this file to run the optimisation.

   .. note::

       The ``num_nodes`` parameter for each pipeline will be ignored when running the hardware
       optimisation since these are some of the parameters that will be optimised.

   .. note::

       If running Monte Carlo simulations, the ``pct_parallelism`` parameter will be sampled from a
       uniform distribution between ``pct_parallelism_min`` and ``pct_parallelism_max`` and the
       ``node_hours`` parameter will be sampled from a normal distribution with a mean of
       ``node_hours_mean`` and a standard deviation of ``node_hours_mean * node_hours_uncertainty``.

   .. note::

       To avoid sampling parameters that are not needed, the pipeline configuration file should only
       contain the pipelines that are used in the scheduling block types configuration file.

4. **Your hardware budget (euros).**

   The optimisation process will optimise the split of your budget between compute nodes, capacity
   storage and performance storage. If you don't supply your own budget, a default value of 8M Euros
   will be used.

   .. note::

       The cost per compute node, capacity storage and performance storage are currently hardcoded.

5. The number of groups you want to split the pipelines into.

   The number of groups will be the smallest of the number of pipelines in the pipeline
   configuration file and the ``num_pipeline_groups`` parameter. The default value is 6.

6. **The number of trials you want to run.**

   This is a key parameter in the optimisation process that determines how many times parameters are
   sampled and a simulation is run.

   Default is 10.

7. **The number of Monte Carlo iterations you want to run.**

   This is an optional part of the optimisation process. If you set this to a value greater than 0
   then the percentage parallelism parameter will be sampled from a uniform distribution between
   ``pct_parallelism_min`` and ``pct_parallelism_max`` for each pipeline and the ``node_hours``
   parameter will be sampled from a normal distribution with a mean of ``node_hours_mean`` and a
   standard deviation of ``node_hours_mean * node_hours_uncertainty``. The mean simulation time is
   then used for each optimisation trial.

   Default is 0.

8. **Whether or not to shuffle the order of observations in the observing schedule.**

   This is an optional part of the Monte Carlo simulation.

   Default is False.

9. **The storage name for the database where the results will be stored**

   Default is "sqlite:///hardware-optimisation.db".

   If this database already exists then the results will be appended to it. If it does not exist, a
   new database will be created.

10. **A study name for the experiment you are running**

    This is optional and will default to ``budget-{budget}`` where ``budget`` is the value you
    provided.

    .. note::

        If you do not provide paths to the configuration files, the default paths will be used.

Running the optimisation
------------------------

Once you have the required inputs above, you can run the hardware optimisation.

The optimisation can be run programmatically as part of a Python script or Jupyter notebook or on
the CLI.

Using the CLI
~~~~~~~~~~~~~

To use the CLI, run the optimisation locally using Poetry:

.. code-block:: bash

    poetry run optimise_hardware \
    --budget 8000000 \
    --n_trials 100 \
    --observing_schedule_path data/schedules/observation-schedule.json \
    --scheduling_block_types_config_path data/config/scheduling_block_types.json \
    --pipelines_config_path data/config/pipelines.json \
    --storage sqlite:///hardware-optimisation.db \
    --num_monte_carlo_iterations 10 \
    --num_pipeline_groups 6 \
    --shuffle

This will generate logs to the console and store results in a database. Each time a new optimisation
is run, the results will be appended to the database.

Get more details on each of the CLI arguments by running:

.. code-block:: bash

    poetry run optimise_hardware --help

This will output the following help message:

.. include:: help_text_optimise.txt

To use the CLI, run the application locally using Poetry. You can find more detailed usage
information by running:

.. code-block:: bash

    poetry run resource_usage --help

For the detailed output from the above command, see :ref:`using-the-cli`.

Using the API
~~~~~~~~~~~~~

The following code snippet demonstrates how to run the optimisation in a Python script.

.. code-block:: python

    from ska_sdp_resource_model.simulate.optimise import optimise_hardware

    # Specify optimisation settings
    n_trials = 1000  # Number of trials to run
    num_monte_carlo_iterations = 10  # Number of Monte Carlo iterations to run
    storage = "sqlite:///hardware-optimisation.db"  # Name of the database where the results will be stored

    # Specify the optimisation inputs
    budget = 8e6  # 8 million Euros budget
    path_to_observing_schedule = "data/schedules/observing_schedule.json"
    path_to_scheduling_block_types = "data/config/scheduling_block_types.json"
    path_to_pipeline_config = "data/config/pipelines.json"

    # Run the optimisation
    study = optimise_hardware(
        budget=budget,
        n_trials=n_trials,
        observing_schedule_path=path_to_observing_schedule,
        scheduling_block_types_config_path=path_to_scheduling_block_types,
        pipelines_config_path=path_to_pipeline_config,
        storage=storage,
        num_monte_carlo_iterations=num_monte_carlo_iterations,
        shuffle=True,
    )

    # Print the best hardware configuration
    print(
        f"Best hardware config:\n {json.dumps(study.best_trial.user_attrs['hardware_config'], indent=4)}"
    )
    print("Best compute node allocation:")
    for pipeline, config in study.best_trial.user_attrs["pipelines_config"].items():
        print(f"   {pipeline}: {config['num_nodes']}")

    # Get the results as a pandas DataFrame for further analysis
    results = study.trials_dataframe()

Using a Jupyter notebook
~~~~~~~~~~~~~~~~~~~~~~~~

We have included example Jupyter notebooks in the `notebooks directory
<https://gitlab.com/ska-telescope/sdp/ska-sdp-resource-model/-/tree/main/notebooks>`_ of the
repository.

- ``02-optimise-hardware.ipynb`` demonstrates how to run the optimisation and visualise the results.

Visualising the results
-----------------------

To visualise the results of each optimisation trial, after or during a run, go to the CLI run the
following (replace the database url as appropriate) and click on the URL to launch the optuna
dashboard where you can interact with a variety of plots and compare different studies:

.. code-block:: bash

    poetry run optuna-dashboard sqlite:///hardware-optimisation.db