How to run a Monte Carlo simulation

This tutorial takes you through the steps required to run a simulation with Monte Carlo iterations using the SDP resource model.

Ensure you have followed the installation instructions and successfully run the simulation with default settings before continuing.

Read the following sections to ensure you are familiar with the simulation’s inputs:

Note

An observing schedule can optionally be generated by sampling scheduling block types randomly or cycling through them. This is useful for testing purposes or if you want to simulate a schedule with a specific duration.

And the following page to understand how to customise the inputs and run the simulation:

Running the simulation

Requirements for the Monte Carlo simulation

To run the Monte Carlo simulation you will need the following inputs:

An observing schedule you want to simulate.

There are two options:
- Provide the path to a JSON file containing the observing schedule you want to simulate (see observing schedule).
- Decide the duration in hours of the observing schedule you want to simulate and use the --generate_observing_schedule_hrs option to generate an observing schedule by sampling scheduling block types (see below).
Configuration of scheduling block types that make up your observing schedule (JSON).

See Scheduling block types configuration file.

Any scheduling block type that is in the observing schedule file must be in this JSON file with all key value pairs.

You will need the path to this file to run the simulation.
Configuration for each pipeline (JSON).

See Pipeline configuration file for details on the pipeline configuration.

Any pipeline that is in the scheduling block types configuration file must be in this JSON file with all key value pairs.

You will need the path to this file to run the optimisation.

Note

You can run a hardware optimisation to find the best num_nodes parameters to use for each pipeline (see Running a hardware optimisation).

Note

The pct_parallelism parameter will be sampled from a uniform distribution between pct_parallelism_min and pct_parallelism_max and the node_hours parameter will be sampled from a zero-truncated normal distribution with a mean of node_hours_mean and a standard deviation of node_hours_mean * node_hours_uncertainty.

Note

To avoid sampling parameters that are not needed, the pipeline configuration file should only contain the pipelines that are used in the scheduling block types configuration file.
The hardware configuration you want to simulate (JSON).

See Hardware configuration file.

You will need the path to this file and the key to your chosen hardware configuration to run the simulation.

Note

You can run a hardware optimisation to find the best configuration for a given budget (see Running a hardware optimisation).
The number of Monte Carlo iterations you want to run.

You must set this to a value greater than 0 (the default is 0, which skips the Monte Carlo runs).
Whether or not to shuffle the order of observations in the observing schedule.

This is an optional part of the Monte Carlo simulation. If set it will shuffle the order of observations at the start of each Monte Carlo iteration.

Default is False.

Running the Monte Carlo simulation

Once you have the required inputs above, you can run the Monte Carlo simulation.

This can be run programmatically as part of a Python script or Jupyter notebook or on the CLI. You can also run it from the dashboard interface.

Using the CLI

To use the CLI, run the Monte Carlo simulation locally using Poetry:

poetry run resource_usage \
--observing_schedule_path data/schedules/observing_schedule.json \
--scheduling_block_types_config_path data/config/scheduling_block_types.json \
--hardware_config_path data/config/hardware.json \
--hardware_config "control" \
--pipelines_config_path data/config/pipelines.json \
--num_monte_carlo_iterations 10000 \
--num_workers 16 \ # number of worker processes you want to use for running parallel simulations
--shuffle_observations \ # Optional if you want to shuffle an observing schedule
--generate_observing_schedule_hrs 24 \ # Optional if you want to generate an observing schedule

Get more details on each of the CLI arguments by running:

poetry run resource_usage --help

The simulation will output the mean run time with 95% confidence intervals to logs and the console.

Using the API

The following code snippet demonstrates how to run the optimisation in a Python script. The output returned as a dictionary containing two pandas DataFrames for resource usage over time and simulation events. Summary statistics are also printed to a log file. If you want to print debug information to the log file, you can set the debug flag to True.

from ska_sdp_resource_model.simulate.main import run_simulation
from ska_sdp_resource_model.simulate.logger import setup_logger

logger = setup_logger(debug=False)  # Set to True to print debug information to log file

output = run_simulation(
    hardware_config="control",
    observing_schedule_path="data/schedules/observing_schedule.json",  # Ignored if generate_observing_schedule_hrs > 0
    scheduling_block_types_config_path="data/config/scheduling_block_types.json",
    hardware_config_path="data/config/hardware.json",
    pipelines_config_path="data/config/pipelines.json",
    num_monte_carlo_iterations=100,
    shuffle=True,  # Optional if you want to shuffle an observing schedule
    generate_observing_schedule_hrs=400,  # Optional if you want to generate an observing schedule
)

Using a Jupyter notebook

We have included example Jupyter notebooks in the notebooks directory of the repository.

03-optimise-hardware-plus-monte-carlo.ipynb demonstrates how to run a Monte Carlo simulation (with hardware optimisation) and visualise the results.

Using the web interface

To use the web interface, run the application locally using Poetry:

poetry run resource_model

The app will run on localhost at port 8050 by default: http://127.0.0.1:8050/. At the top of the dashboard is a box to input the number of Monte Carlo iterations to run. Click the run button to start the simulation. Results of each trial will be displayed in the plots and the mean run time with 95% confidence intervals will be printed.

Use Ctrl + C in the terminal to stop the server.