How to run an end-to-end experiment
This tutorial takes you through the steps required to run an end-to-end experiment using the
end_to_end_simulaiton.ipynb in the notebooks directory of the repo here.
In addition to optimising the hardware configuration and running a Monte Carlo simulation (see Running an experiment), this also generates the input configuration files based on scientific parameters and benchmarking data, using the SDP Parametric Model to estimate the node hours required by each pipeline and the data storage required for each pipeline and scheduling block instance.
Ensure you have followed the installation instructions and successfully run the simulation with default settings before continuing.
Read the following sections to ensure you are familiar with the simulation’s inputs:
Also read the following tutorials to understand how to run the simulation and how to optimise the hardware configuration:
Requirements for running an end-to-end experiment
To run an experiment you will need the following inputs:
The scientific parameters of the observations you want to simulate (.xls).
See SDP Pipeline resource requirements estimation (link to Confluence) for further details.
You will need the path to this file to run the experiment.
Your hardware budget (euros).
The optimisation process will optimise the split of your budget between compute nodes, capacity storage and performance storage.
Note
The cost per compute node, capacity storage and performance storage are currently hardcoded.
The parallelism percentage you want to use for each pipeline.
These values can be changed in the notebook.
The target total integration time for the observing schedule
This can be changed in the notebook and will be used to generate an observing schedule by cycling through scheduling block types.
The number of optimisation trials you want to run.
This is a key parameter in the optimisation process that determines how many times parameters are sampled and a simulation is run.
The number of Monte Carlo iterations you want to run.
Set this to 0 to skip the Monte Carlo runs.
Whether or not to shuffle the order of observations in the observing schedule.
This is an optional part of the Monte Carlo simulation. If set it will shuffle the order of observations at the start of each Monte Carlo iteration.
How many groups to group pipelines during the optimisation step.
This can be changed in the notebook. It is used to group pipelines together to reduce the number of parameters to optimise. If the number provided is greater than the number of pipelines, then the number of pipelines will be used instead.
The storage name for the database where the optimisation results will be stored
If this database already exists then the results will be appended to it. If it does not exist, a new database will be created. The notebook ensures a new database is created each time it runs.
Note
The database will be created in the current working directory.
Note
The historical values of the optimisation results will be used by the optimisation algorithm to guide the sampling of parameter values. If you don’t want this behaviour, you can delete the database and a new one will be created.
Running the experiment
Once you have the required inputs above and edited the notebook as required, you can run the experiment by running the notebook.