optimise

Module to optimise the hardware configuration of a simulation.

ska_sdp_resource_model.optimise.optimise.configure_hardware(budgets)[source]

Suggest hardware configuration parameters based on normalised costs.

Given a fixed budget and costs for compute, capacity and performance storage.

Parameters:

budgets (dict) – The budget for each component. Must include the following: compute, capacity, data_product and performance.

Returns:

hardware_configuration (dict) – The hardware configuration, including compute_nodes, capacity_storage_pb and performance_storage_tb.

ska_sdp_resource_model.optimise.optimise.get_budget_split(trial, budget)[source]

Get the budget split for each component.

Parameters:

trial – The optuna trial object. budget (int): Total budget in EUR.

Returns:

budget_allocation (dict) – A dictionary with the budget allocated for each component.

ska_sdp_resource_model.optimise.optimise.get_hardware_cost(hardware_config)[source]

Compute the total cost of a hardware configuration in EUR.

Parameters:

hardware_config (dict) – Hardware configuration. Must include the following: capacity_storage_pb, performance_storage_tb and compute_nodes

Returns:

cost (dict) – Dictionary with the cost of each component and the total cost.

ska_sdp_resource_model.optimise.optimise.group_pipelines(pipelines_config, n_groups, seed=42)[source]

Group pipelines into n_groups.

Uses K-means clustering to group pipelines into n_groups based on node_hours_mean, pct_parallelism_min and pct_parallelism_max.

Parameters:
  • pipelines_config (dict) – The pipelines configuration.

  • n_groups (int) – The number of groups to create.

Returns:

grouped_pipelines (dict) – Dictionary of pipelines with assigned group number.

ska_sdp_resource_model.optimise.optimise.main()[source]

Optimise the hardware configuration.

Runs optuna on the hardware configuration to minimise the total time taken.

Returns:

None

ska_sdp_resource_model.optimise.optimise.objective(trial, budget, observations_list, pipelines_config, num_monte_carlo_iterations=0, num_workers=1, shuffle=False, n_groups=6)[source]

Objective function for optimisation.

Parameters:
  • trial – The optuna trial object.

  • observations_list (list) – List of observations to simulate.

  • budget (float) – The budget for the hardware.

  • pipelines_config (dict) – Pipelines configuration data.

  • scheduling_block_types_config_path (Path) – Path to the scheduling block types file.

  • num_monte_carlo_iterations (int) – Number of Monte Carlo iterations to run. Defaults to 0.

  • shuffle (bool) – Whether to shuffle the observations list before running the simulation. Defaults to False.

  • n_groups (int) – The number of groups for clustering pipelines. Defaults to 6.

Returns:

float – Total simulation time in days.

ska_sdp_resource_model.optimise.optimise.optimise_hardware(budget=8000000.0, n_trials=10, verbose=False, observing_schedule_path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/ska-telescope-ska-sdp-resource-model/checkouts/latest/src/ska_sdp_resource_model/data/schedules/observing_schedule.json'), pipelines_config_path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/ska-telescope-ska-sdp-resource-model/checkouts/latest/src/ska_sdp_resource_model/data/config/pipelines.json'), scheduling_block_types_config_path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/ska-telescope-ska-sdp-resource-model/checkouts/latest/src/ska_sdp_resource_model/data/config/scheduling_block_types.json'), storage='sqlite:///hardware-optimisation.db', study_name=None, num_monte_carlo_iterations=0, shuffle=False, n_groups=6, client=None)[source]

Optimise the hardware configuration.

If num_monte_carlo_iterations is greater than 0, the simulation will be run multiple times with different values for pct_parallelism for each pipeline and the average time taken will be used to optimise hardware.

Parameters:
  • budget (float) – The budget for the hardware.

  • n_trials (int) – The number of trials to run.

  • verbose (bool) – Whether to print verbose output.

  • observing_schedule_path (Path) – Path to the observing schedule file.

  • pipelines_config_path (Path) – Path to the pipelines configuration file.

  • scheduling_block_types_config_path (Path) – Path to the scheduling block types file.

  • storage (str) – The name of the database for the optuna study. Defaults to ‘sqlite:///hardware-optimisation.db’.

  • study_name (str) – The name of the optuna study. Defaults to ‘budget-{budget}’.

  • num_monte_carlo_iterations (int) – Number of Monte Carlo iterations to run. Defaults to 0.

  • shuffle (bool) – Whether to shuffle the observations list before running the simulation. Defaults to False.

  • n_groups (int) – The number of groups for clustering pipelines. Defaults to 6.

  • client (dask.Client) – Optional, Dask client to utilise for parallelism.

Returns:

best_params (dict) – The best parameters.

ska_sdp_resource_model.optimise.optimise.parameterise_pipelines_num_nodes(trial, max_compute_nodes, pipelines_config, n_groups=6)[source]

Parameterise the pipelines configuration.

Reads in the pipelines config file and uses optuna to sample the number of compute nodes for each pipeline, replacing the “num_nodes” parameter.

Parameters:
  • trial – The optuna trial object.

  • max_compute_nodes (int) – The maximum number of compute nodes.

  • pipelines_config_path (Path) – Path to the pipelines configuration file.

  • n_groups (int) – The number of groups for clustering pipelines. Defaults to 6.

Returns:
  • pipelines_config (dict) – The parameterised pipelines configuration.

  • trial – The updated optuna trial object.

ska_sdp_resource_model.optimise.optimise.update_scheduling_blocks(observations_list, pipelines_config)[source]

Update the number of nodes for each pipeline in the scheduling blocks.

This function iterates through the observation list and updates the num_nodes parameter for each pipeline in the pipeline_steps dictionary with the value from the pipelines_config dictionary.

Parameters:
  • observations_list (list) – List of observations.

  • pipelines_config (dict) – Dictionary containing configuration for each pipeline.

Returns:

None