process_inputs

Module for reading and processing input files.

ska_sdp_resource_model.simulate.process_inputs.add_pipeline_config_to_scheduling_blocks(scheduling_block_types_config, pipelines_config)[source]

Add pipeline configurations to scheduling block type configurations.

Replaces the list of pipeline steps with a dictionary of pipeline configurations.

Parameters:

scheduling_block_types_config (dict) – Scheduling block types configuration.
pipelines_config (dict) – Pipelines configuration.

Returns:

scheduling_block_types_config (dict) – Scheduling block types configuration with pipeline configurations.

ska_sdp_resource_model.simulate.process_inputs.get_hardware_config_options(hardware_config_path)[source]

Get hardware configuration options for plotting.

Parameters:: hardware_config_path (str or Path) – relative path to json file containing hardware configuration data.

ska_sdp_resource_model.simulate.process_inputs.get_observations(pipelines_config, observing_schedule_path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/ska-telescope-ska-sdp-resource-model/checkouts/latest/src/ska_sdp_resource_model/data/schedules/observing_schedule.json'), scheduling_block_types_config_path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/ska-telescope-ska-sdp-resource-model/checkouts/latest/src/ska_sdp_resource_model/data/config/scheduling_block_types.json'), generate_observing_schedule_hrs=0)[source]

Get the observations configuration.

Reads in the observing schedule file and the scheduling block types file, and adds the pipeline configurations to the scheduling blocks.

Parameters:

pipelines_config (dict) – The pipelines configuration.
observing_schedule_path (str or Path) – Path to the observing schedule file.
scheduling_block_types_config_path (str or Path) – Path to the scheduling block types file.

Returns:

observations (list) – The observations schedule with pipeline configurations.

ska_sdp_resource_model.simulate.process_inputs.get_observations_config(observing_schedule, scheduling_block_types_config)[source]

Get list of configurations for all observations in the observing schedule.

GB values are rounded to the nearest whole number.

Parameters:

observing_schedule (dict) – Observing schedule.
scheduling_block_types_config (dict) – Scheduling block types configuration with pipeline configurations.

Returns:

observations_config (list) – List of configurations for all observations in the observing schedule.

ska_sdp_resource_model.simulate.process_inputs.get_processing_blocks(scheduling_block_types_config, pipelines_config)[source]

Get processing blocks from configuration.

Takes the configurations of scheduling block types and pipelines and returns a DataFrame containing the storage and compute requirements. Columns are included for the serial and parallel node hours required. This is used for plotting resource requirements independently of the simulation. Actual runtimes in the simulation will depend on the number of nodes available to run each pipeline (the parallel node hours will be divided by the number of nodes allocated to a pipeline).

Parameters:

scheduling_block_types_config (dict) – Scheduling block configuration.
pipeline_config (dict) – Pipeline configuration.

Returns:

processing_blocks (pd.DataFrame) – DataFrame of processing blocks.

ska_sdp_resource_model.simulate.process_inputs.get_scheduling_blocks_data(observation_list)[source]

Get the list of scheduling blocks from an observation list.

This function uses python builtins to sort, group and aggregate the scheduling block instances. This implementation is around ~500-800 times faster than the previous implementation using pandas DataFrames.

Parameters:: observations (list) – Sequence of scheduling blocks with pipeline configurations
Returns:: scheduling_blocks_grouped (dict) – Dictionary containing scheduling block data.

ska_sdp_resource_model.simulate.process_inputs.load_config(file_path, expected_keys=None, config_model=None)[source]

Load configuration from a file.

Parameters:

file_path (str or Path) – Path to the configuration file.
expected_keys (list) – Optional list of expected keys for each item in the configuration file.
config_model (BaseModel) – Optional Pydantic model to validate the configuration.

Returns:

config (dict) – Configuration data loaded from file_path.

ska_sdp_resource_model.simulate.process_inputs.load_hardware_configs(hardware_config_path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/ska-telescope-ska-sdp-resource-model/checkouts/latest/src/ska_sdp_resource_model/data/config/hardware.json'))[source]

Load hardware configuration data.

Parameters:: hardware_config (str or Path) – Relative path to json file containing hardware configuration data.
Returns:: hardware_config (dict) – Hardware configuration data.

ska_sdp_resource_model.simulate.process_inputs.load_pipelines_config(path)[source]

Load configuration for simulated pipelines.

Parameters:: path (Path) – Path to json file containing pipeline configuration data.
Returns:: pipelines_config (dict) – Pipelines configuration data.

ska_sdp_resource_model.simulate.process_inputs.process_inputs(observing_schedule_path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/ska-telescope-ska-sdp-resource-model/checkouts/latest/src/ska_sdp_resource_model/data/schedules/observing_schedule.json'), scheduling_block_types_config_path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/ska-telescope-ska-sdp-resource-model/checkouts/latest/src/ska_sdp_resource_model/data/config/scheduling_block_types.json'), pipelines_config_path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/ska-telescope-ska-sdp-resource-model/checkouts/latest/src/ska_sdp_resource_model/data/config/pipelines.json'), generate_observing_schedule_hrs=0)[source]

Process inputs to get observations for simulation.

Parameters:

observing_schedule_path (str or Path) – Relative path to json file containing observing schedule data.
scheduling_block_types_config (str or Path) – Relative path to json file containing scheduling block types configuration data.
pipelines_config (str or Path) – Relative path to json file containing pipelines configuration data.

Returns:

observations (list) – A sequence of scheduling blocks with pipeline configurations to use as input to simulation.
pipelines_config (dict) – Pipeline configuration data.

ska_sdp_resource_model.simulate.process_inputs.read_base64_contents(file_path_or_contents)[source]

Read a Base64-encoded JSON file and the contents as a dictionary.

Parameters:: file_path_or_contents (str) – A Base64-encoded string of the JSON file.
Returns:: dict – Dictionary containing the decoded JSON data.

ska_sdp_resource_model.simulate.process_inputs.sanitise_pipeline_config(**config)[source]

Sanitise configuration for a single pipeline.

Ensures that the required keys “node_hours” and “pct_parallelism” exist, defaulting their values to those of “node_hours_mean” and “pct_parallelism_min” if necessary. Removes superfluous keys: {“node_hours_mean”, “node_hours_uncertainty”, “pct_parallelism_min”, “pct_parallelism_max”}

Parameters:: config (dict) – Pipeline configuration.
Returns:: dict – Sanitised pipeline configuration.

ska_sdp_resource_model.simulate.process_inputs.sanitise_pipelines_config(**config)[source]

Sanitise pipeline configuration for resourse usage simulation before scheduling.

For all pipelines, ensure that the required keys “node_hours” and “pct_parallelism” exist, defaulting their values to those of “node_hours_mean” and “pct_parallelism_min” if necessary. Removes superfluous keys: {“node_hours_mean”, “node_hours_uncertainty”, “pct_parallelism_min”, “pct_parallelism_max”}

Parameters:: config (dict) – Pipeline configuration.
Returns:: dict – Sanitised pipeline configuration.

ska_sdp_resource_model.simulate.process_inputs.update_observations(observations, pipelines_config)[source]

Update the observations list with new pipeline step configurations.

Parameters:

observations (list) – A list of observations to update.
pipelines_config (dict) – A dictionary containing the configuration of each pipeline.

Returns:

updated_observations (list) – A list of updated observations.

ska_sdp_resource_model.simulate.process_inputs.validate_config_schema(config, expected_keys, file_path)[source]

Validate configuration against expected keys.

Parameters:

config (dict) – Configuration data to validate.
expected_keys (list) – List of expected keys for each item in the configuration data.
file_path (str or Path) – Path to the configuration file.

Returns:

bool – True if configuration is valid, False otherwise.