How to optimise a hardware configuration ======================================== This tutorial takes you through the steps required to optimise a hardware configuration using the SDP resource model. An overview of this functionality is provided in the flowchart below: .. image:: ../images/optimisation-flow-chart.jpg :alt: Optimise hardware configuration flowchart :width: 600 :align: center :class: margin-bottom | The optimisation process will determine the best hardware configuration for a specific hardware budget given an observing schedule and its component scheduling block types and pipelines. This includes: - Total number of compute nodes - Total capacity storage in PB - Total performance storage in TB - Number of compute nodes allocated to each pipeline "group" The optimisation step will split the provided hardware budget between compute nodes, capacity storage and performance storage to find the optimum hardware configuration. The cost of each of these components is currently hardcoded in the simulation. Pipelines are grouped automatically by the simulation (using K-means clustering) based on their ``node_hours_mean``, ``pct_parallelism_min`` and ``pct_parallelism_max`` parameters. The number of groups will be set to the number of pipelines or ``num_pipeline_groups`` (default 6), whichever is smaller. This allows a better search of the parameter space. The optuna library is used to perform the optimisation with the default (TPESampler) optimisation algorithm. See the `optuna documentation `_ for more information. Read the following sections to ensure you are familiar with the simulation's inputs: - :doc:`Hardware configuration file <../usage/inputs/configuration/hardware_configuration>` - :doc:`Scheduling block types configuration file <../usage/inputs/configuration/scheduling_block_types_configuration>` - :doc:`Pipeline configuration file <../usage/inputs/configuration/pipeline_configuration>` - :doc:`Observing schedule file <../usage/inputs/configuration/observing_schedule>` .. note:: A hardware configuration file is not required for the optimisation. Ensure you have followed the :doc:`installation instructions <../installation>` and successfully run the simulation before continuing. Requirements for the optimisation --------------------------------- To run the hardware optimisation, you will need the following inputs: 1. **An observing schedule you want to simulate (JSON).** Its contents should look something like this (read more about this in the :doc:`Defining an observing schedule file <../usage/inputs/configuration/observing_schedule>` section): .. code-block:: none { "scheduling_blocks": [ "Calibrated Visibilities", "Continuum Shallow", "Beam Monitoring", "Continuum Deep", "Continuum Shallow", "Continuum Deep", "Maintenance", "Spectral Deep" ] } You will need the path to this file to run the optimisation. 2. **Configuration of scheduling block types that make up your observing schedule (JSON).** This should look something like this (read more about this in the :doc:`Scheduling block types configuration file <../usage/inputs/configuration/scheduling_block_types_configuration>` section): .. code-block:: python ... { "my_scheduling_block_type": { "description": "My scheduling block type", "short_name": "MST", "scheduling_block_instance_time_hrs": 1.0, "integration_time_hrs": 10.0, "pipeline_steps": ["pipeline_A", "pipeline_B"], "raw_vis_gb": 10.0, "processed_vis_gb": 20.0, "data_retention_hrs": 24.0, } } ... Any scheduling block type that is in the observing schedule file must be in this JSON file with all key value pairs. You will need the path to this file to run the optimisation. 3. **Configuration for each pipeline (JSON).** This should look something like this for the scheduling block example above (read more about this in the :doc:`Pipeline configuration file <../usage/inputs/configuration/pipeline_configuration>` section): .. code-block:: python ... { "name": "pipeline_A", "steps": [ { "description": "A data processing pipeline", "node_hours": 1000.0, "pct_parallelism": 80.0, "data_product_storage_gb": 1000.0, "num_nodes": 25, } ], }, { "name": "pipeline_B", "steps": [ { "description": "Another data processing pipeline", "node_hours": 500.0, "pct_parallelism": 50.0, "data_product_storage_gb": 3000.0, "num_nodes": 50, } ], } ... Any pipeline that is in the scheduling block types configuration file must be in this JSON file with all key value pairs. You will need the path to this file to run the optimisation. .. note:: The ``num_nodes`` parameter for each pipeline will be ignored when running the hardware optimisation since these are some of the parameters that will be optimised. .. note:: If running Monte Carlo simulations, the ``pct_parallelism`` parameter will be sampled from a uniform distribution between ``pct_parallelism_min`` and ``pct_parallelism_max`` and the ``node_hours`` parameter will be sampled from a normal distribution with a mean of ``node_hours_mean`` and a standard deviation of ``node_hours_mean * node_hours_uncertainty``. .. note:: To avoid sampling parameters that are not needed, the pipeline configuration file should only contain the pipelines that are used in the scheduling block types configuration file. 4. **Your hardware budget (euros).** The optimisation process will optimise the split of your budget between compute nodes, capacity storage and performance storage. If you don't supply your own budget, a default value of 8M Euros will be used. .. note:: The cost per compute node, capacity storage and performance storage are currently hardcoded. 5. The number of groups you want to split the pipelines into. The number of groups will be the smallest of the number of pipelines in the pipeline configuration file and the ``num_pipeline_groups`` parameter. The default value is 6. 6. **The number of trials you want to run.** This is a key parameter in the optimisation process that determines how many times parameters are sampled and a simulation is run. Default is 10. 7. **The number of Monte Carlo iterations you want to run.** This is an optional part of the optimisation process. If you set this to a value greater than 0 then the percentage parallelism parameter will be sampled from a uniform distribution between ``pct_parallelism_min`` and ``pct_parallelism_max`` for each pipeline and the ``node_hours`` parameter will be sampled from a normal distribution with a mean of ``node_hours_mean`` and a standard deviation of ``node_hours_mean * node_hours_uncertainty``. The mean simulation time is then used for each optimisation trial. Default is 0. 8. **Whether or not to shuffle the order of observations in the observing schedule.** This is an optional part of the Monte Carlo simulation. Default is False. 9. **The storage name for the database where the results will be stored** Default is "sqlite:///hardware-optimisation.db". If this database already exists then the results will be appended to it. If it does not exist, a new database will be created. 10. **A study name for the experiment you are running** This is optional and will default to ``budget-{budget}`` where ``budget`` is the value you provided. .. note:: If you do not provide paths to the configuration files, the default paths will be used. Running the optimisation ------------------------ Once you have the required inputs above, you can run the hardware optimisation. The optimisation can be run programmatically as part of a Python script or Jupyter notebook or on the CLI. Using the CLI ~~~~~~~~~~~~~ To use the CLI, run the optimisation locally using Poetry: .. code-block:: bash poetry run optimise_hardware \ --budget 8000000 \ --n_trials 100 \ --observing_schedule_path data/schedules/observation-schedule.json \ --scheduling_block_types_config_path data/config/scheduling_block_types.json \ --pipelines_config_path data/config/pipelines.json \ --storage sqlite:///hardware-optimisation.db \ --num_monte_carlo_iterations 10 \ --num_pipeline_groups 6 \ --shuffle This will generate logs to the console and store results in a database. Each time a new optimisation is run, the results will be appended to the database. Get more details on each of the CLI arguments by running: .. code-block:: bash poetry run optimise_hardware --help This will output the following help message: .. include:: help_text_optimise.txt To use the CLI, run the application locally using Poetry. You can find more detailed usage information by running: .. code-block:: bash poetry run resource_usage --help For the detailed output from the above command, see :ref:`using-the-cli`. Using the API ~~~~~~~~~~~~~ The following code snippet demonstrates how to run the optimisation in a Python script. .. code-block:: python from ska_sdp_resource_model.simulate.optimise import optimise_hardware # Specify optimisation settings n_trials = 1000 # Number of trials to run num_monte_carlo_iterations = 10 # Number of Monte Carlo iterations to run storage = "sqlite:///hardware-optimisation.db" # Name of the database where the results will be stored # Specify the optimisation inputs budget = 8e6 # 8 million Euros budget path_to_observing_schedule = "data/schedules/observing_schedule.json" path_to_scheduling_block_types = "data/config/scheduling_block_types.json" path_to_pipeline_config = "data/config/pipelines.json" # Run the optimisation study = optimise_hardware( budget=budget, n_trials=n_trials, observing_schedule_path=path_to_observing_schedule, scheduling_block_types_config_path=path_to_scheduling_block_types, pipelines_config_path=path_to_pipeline_config, storage=storage, num_monte_carlo_iterations=num_monte_carlo_iterations, shuffle=True, ) # Print the best hardware configuration print( f"Best hardware config:\n {json.dumps(study.best_trial.user_attrs['hardware_config'], indent=4)}" ) print("Best compute node allocation:") for pipeline, config in study.best_trial.user_attrs["pipelines_config"].items(): print(f" {pipeline}: {config['num_nodes']}") # Get the results as a pandas DataFrame for further analysis results = study.trials_dataframe() Using a Jupyter notebook ~~~~~~~~~~~~~~~~~~~~~~~~ We have included example Jupyter notebooks in the `notebooks directory `_ of the repository. - ``02-optimise-hardware.ipynb`` demonstrates how to run the optimisation and visualise the results. Visualising the results ----------------------- To visualise the results of each optimisation trial, after or during a run, go to the CLI run the following (replace the database url as appropriate) and click on the URL to launch the optuna dashboard where you can interact with a variety of plots and compare different studies: .. code-block:: bash poetry run optuna-dashboard sqlite:///hardware-optimisation.db