How to run an experiment

This tutorial takes you through the steps required to run an experiment, which optimises the hardware configuration and generates the mean run time, with 95% confidence, from a Monte Carlo simulation on a given set of inputs.

Ensure you have followed the installation instructions and successfully run the simulation with default settings before continuing.

Read the following sections to ensure you are familiar with the simulation’s inputs:

Also read the following tutorials to understand how to run the simulation and how to optimise the hardware configuration:

Requirements for running an experiment

To run an experiment you will need the following inputs:

  1. An observing schedule you want to simulate (JSON).

    See observing schedule.

    You will need the path to this file to run the experiment.

  2. Configuration of scheduling block types that make up your observing schedule (JSON).

    See Scheduling block types configuration file.

    Any scheduling block type that is in the observing schedule file must be in this JSON file with all key value pairs.

    You will need the path to this file to run the experiment.

  3. Configuration for each pipeline (JSON).

    See Pipeline configuration file for details on the pipeline configuration.

    Any pipeline that is in the scheduling block types configuration file must be in this JSON file with all key value pairs.

    You will need the path to this file to run the experiment.

    Note

    The hardware optimisation will search for the best num_nodes parameters to use for each pipeline (see Running a hardware optimisation).

    Note

    The pct_parallelism parameter will be sampled from a uniform distribution between pct_parallelism_min and pct_parallelism_max and the node_hours parameter will be sampled from a zero-truncated normal distribution with a mean of node_hours_mean and a standard deviation of node_hours_mean * node_hours_uncertainty.

    Note

    To avoid sampling parameters that are not needed, the pipeline configuration file should only contain the pipelines that are used in the scheduling block types configuration file.

  1. Your hardware budget (euros).

    The optimisation process will optimise the split of your budget between compute nodes, capacity storage and performance storage. If you don’t supply your own budget, a default value of 8M Euros will be used.

    Note

    The cost per compute node, capacity storage and performance storage are currently hardcoded.

  2. The number of optimisation trials you want to run.

    This is a key parameter in the optimisation process that determines how many times parameters are sampled and a simulation is run.

    Default is 10.

  1. The number of Monte Carlo iterations you want to run.

    You must set this to a value greater than 0 (the default is 0, which skips the Monte Carlo runs). A Monte Carlo simulation is run for each optimisation trial plus one additional run for the best configuration found.

  2. Whether or not to shuffle the order of observations in the observing schedule.

    This is an optional part of the Monte Carlo simulation. If set it will shuffle the order of observations at the start of each Monte Carlo iteration.

    Default is False.

  1. The storage name for the database where the optimisation results will be stored

    Default is “sqlite:///hardware-optimisation.db”.

    If this database already exists then the results will be appended to it. If it does not exist, a new database will be created.

    Note

    The database will be created in the current working directory.

    Note

    The historical values of the optimisation results will be used by the optimisation algorithm to guide the sampling of parameter values. If you don’t want this behaviour, you can delete the database and a new one will be created.

  2. The output directory where the results will be saved.

    The results will be saved in this directory as a JSON file named “results-YYYYMMDD-hhmmss.json”.

Running the experiment

Once you have the required inputs above, you can run the experiment using the CLI:

poetry run run_experiment \
   --observing_schedule_path $OBSERVING_SCHEDULE_PATH \
   --scheduling_block_types_path $SCHEDULING_BLOCK_TYPES_PATH \
   --pipelines_path $PIPELINES_PATH \
   --storage $STORAGE_PATH \
   --output_dir $OUTPUT_DIR \
   --n_trials $N_TRIALS \
   --n_iter $N_ITER \
   --shuffle_observation_order

Get more details on each of the CLI arguments by running:

poetry run run_experiment --help

Interpreting the results

The results file in the output directory will contain the following:

  • Summary statistics from the final Monte Carlo simulation using the optimised hardware configuration:
    • runtime_mean_days: The mean run time.

    • runtime_min_days: The minimum run time.

    • runtime_max_days: The maximum run time.

    • runtime_95_ci_days: The 95% confidence intervals for the mean run time.

  • Configuration found by the optimisation step:
    • hardware_config: The optimised hardware configuration found by the optimisation algorithm used to generate the results.

    • pipelines_config: The pipelines configuration with the optimised num_nodes for each pipeline. Note that the Monte simulation will have sampled different values for the pct_parallelism and node_hours parameters.

  • Configuration of the experiment:
    • args: The commandline arguments used to run the experiment.

Running an experiment using SLURM

You can also run the experiment using SLURM. This is useful if you want to run the experiment on a cluster. We have included an example SLURM script in the scripts directory of the repository. You will need to clone the repository, set up a poetry environment and modify the variables used in the script.

Running an experiment using Jupyter notebooks

We have included example experiments that use Jupyter notebooks in the notebooks directory of the repository. This is useful to inspect detailed outputs of each simulation and the results of the optimisation.