monte_carlo
Module hosting MonteCarloSimulation class.
- class ska_sdp_resource_model.simulate.monte_carlo.MonteCarloSimulation(observations_list, pipelines_config, hardware_config, shuffle=False)[source]
Bases:
objectSets up and executes a Monte Carlo simulation run of the ResourceUsageSimulator.
This class handles parameterising the pipeline configurations and running multiple simulations, possibly concurrently, with different random seeds.
- observations
List of observation dictionaries. Each dictionary contains configurations for scheduling block types and pipeline steps.
- Type:
list
- pipelines_config
Configuration for each pipeline, including ‘node_hours_mean’, ‘node_hours_uncertainty’, ‘pct_parallelism_min’, and ‘pct_parallelism_max’.
- Type:
dict
- hardware_config
Hardware configuration for the simulation. shuffle (bool): Whether to shuffle the observations list before each simulation.
- Type:
dict
- samplers
Samplers for each pipeline’s ‘node_hours’ (truncated normal distribution) and ‘pct_parallelism’ (uniform distribution).
- Type:
dict
- get_samplers()[source]
Initialise distributions for sampling pipeline parameters.
- run()
Execute the Monte Carlo simulation.
- run_simulation()[source]
Run a single iteration of the simulation.
- parameterise_pipelines()[source]
Sample pipeline parameters for a single iteration.
- sample_parameter()[source]
Sample a specific parameter for all pipelines.
- get_node_hours_distribution(config)[source]
Initialise the distribution for sampling pipeline node hours.
- Parameters:
config (dict) – Pipeline configuration dictionary.
- Returns:
dist (scipy.stats._distn_infrastructure.rv_continuous_frozen) – Truncated normal distribution object for sampling node hours.
- get_pct_parallelism_distribution(config)[source]
Initialise the distribution for sampling pipeline percentage parallelism.
- Parameters:
config (dict) – Pipeline configuration dictionary.
- Returns:
dist (scipy.stats._distn_infrastructure.rv_continuous_frozen) – Uniform distribution object for sampling percentage parallelism.
- get_samplers()[source]
Initialise the distributions for sampling pipeline node hours and percentage parallelism.
- Returns:
samplers (dict) – Dictionary containing samplers for each pipline. Each pipeline has samplers for “node_hours” and “pct_parallelism”. Samplers are scipy.stats._distn_infrastructure.rv_continuous_frozen objects.
- parameterise_pipelines(rng=None)[source]
Parameterise the pipelines with sampled values for pct_parallelism and node_hours.
This function samples the parameter space for pct_parallelism and node_hours of each pipeline using the distributions defined in the samplers attribute.
- Parameters:
rng (np.random.Generator) – A random number generator instance.
- Returns:
(dict) – A dictionary containing the parameterised pipeline configurations.
- run(n_iter=100, num_workers=1, seed=None, client=None)
Run a Monte Carlo simulation of the SDP Resource Model.
- Parameters:
n_iter (int) – The number of Monte Carlo trials to run. Default is 100.
num_workers (int) – Number of worker processes to use for running parallel simulations. If None or -1, a suitable default number of workers will be chosen. For a small number of iterations (n_iter < 100), the simulations will be run sequentially on a single worker. For larger simulations, the default value is set to four times the cpu count of the system at import time.
seed (int) – The random seed to use for the Monte Carlo simulation for reproducible results. Default is None.
client (dask.Client) – Optional Dask client to utilise for parallelism. If provided, the Monte Carlo iterations will be run using this scheduler. If None, any previously initialized schedulers will automatically be used by dask. If None, and no exisiting dask scheduler is running, the Monte Carlo iterations will be run sequentially on a single process.
- Returns:
run_times_days (list) – A list of the total simulation times for each Monte Carlo trial in days.
resource_usages (list) – A list of the resource usage dataframes for each Monte Carlo trial.
event_logs (list) – A list of the event logs dataframes for each Monte Carlo trial.
num_successful_runs (int) – The number of successful runs in the Monte Carlo simulation.
- run_simulation(rng)[source]
Run a single Monte Carlo iteration of the SDP Resource Model.
This function samples the parameter space for pct_parallelism and node_hours of each pipeline, updates the observations, runs the simulation and returns the output.
- Parameters:
rng (np.random.Generator) – A random number generator instance.
- Returns:
(dict) – A dictionary containing the simulation results.
- sample_parameter(parameter_name, input_config=None, rng=None)[source]
Generate a randomly sampled value for parameter of each pipeline.
Uses the distribution defined in samplers attribute to draw parameter values for pct_parallelism or node_hours pipeline parameters.
- Parameters:
parameter_name (str) – Name of the parameter to sample. One of “pct_parallelism” or “node_hours”.
input_config (dict) – A dictionary containing the configuration of each pipeline. This will be updated with the parameter value drawn from the pipeline parameter samplers.
rng (np.random.Generator) – A random number generator instance.
- Returns:
pipelines_config_sampled (dict) – A dictionary containing the configuration for each pipeline with a randomly sampled value for parameter_name.
- ska_sdp_resource_model.simulate.monte_carlo.resolve_num_workers(num_workers, n_iter)[source]
Resolve the number of workers to use for parallel processing.
If num_workers is None or -1, use the default number of workers, which is four workers per cpu. Otherwise, convert num_workers to an integer. Also ensure that the number of workers does not exceed the number of Monte Carlo iterations.
- Parameters:
num_workers (int, optional) – The number of workers to use.
n_iter (int) – The number of iterations.
- Returns:
int – The resolved number of workers.
- Raises:
ValueError – If num_workers is not a positive integer, or a default substitute value -1 or None.
- ska_sdp_resource_model.simulate.monte_carlo.shuffle_observations(observations_list, rng)[source]
Shuffle the observations list.
- Parameters:
observations_list (list) – A list of dictionaries containing the observations to be simulated.
rng (np.random.Generator) – A random number generator instance.
- Returns:
None