Running a simulation

Once you have configured the inputs for the simulator (see Configuring the inputs for a simulation), you can run the simulation.

The simulation can be run interactively via a web interface or programmatically as part of a Python script or Jupyter notebook. It can also be run on the commandline.

See also the Interpreting simulation outputs section for information on interpreting the logs and dataframes produced by the simulator.

This page covers running a basic simulation. See the Tutorials section for ways to customise the simulation, optimise the hardware configuration, run monte carlo simulations and run end-to-end experiments.

Using the web interface

To use the web interface, run the application locally using Poetry:

poetry run resource_model

The app will run on localhost at port 8050 by default: http://127.0.0.1:8050/. Use Ctrl + C in the terminal to stop the server.

Plots

The web interface provides several interactive example plots on the front-end:

  • A tree map of scheduling block metrics read from data/config/scheduling_block_types.json.

  • A Gantt-style chart of scheduled block instances read from data/schedules/observing_schedule.json.

  • A summary of the total run time.

  • A line plot of the compute node usage over time.

  • A line plot of the capacity storage usage (raw visibilities and data products) over time.

  • A line plot of the performance storage usage (pre-processed visibilities) over time.

  • A Gantt-style chart showing the start and end of batch processing for each scheduling block instance that triggers (where start is defined as the time when performance storage has been allocated and end is when all processing has completed and performance storage has been released).

  • A strip plot showing individual wait times for each scheduling block instance. This includes waiting for capacity storage (either for raw visibilities or data products), performance storage and compute nodes.

Using the CLI

To use the CLI, run the application locally using Poetry. You can find more detailed usage information by running:

poetry run resource_usage --help

This will output the following help message:

usage: __main__.py [-h] [--observing_schedule_path OBSERVING_SCHEDULE_PATH]
                   [--generate_observing_schedule_hrs GENERATE_OBSERVING_SCHEDULE_HRS]
                   [--hardware_path HARDWARE_PATH] [--hardware HARDWARE]
                   [--scheduling_block_types_path SCHEDULING_BLOCK_TYPES_PATH]
                   [--pipelines_path PIPELINES_PATH]
                   [--num_monte_carlo_iterations NUM_MONTE_CARLO_ITERATIONS]
                   [--num_workers NUM_WORKERS] [--shuffle_observations]
                   [--output_path OUTPUT_PATH] [--verbose] [--debug]

Run the resource usage simulation.

options:
  -h, --help            show this help message and exit
  --observing_schedule_path OBSERVING_SCHEDULE_PATH
                        Path to CSV file containing the observing schedule.
  --generate_observing_schedule_hrs GENERATE_OBSERVING_SCHEDULE_HRS
                        Generate a new observing schedule with the specified
                        number of hours of observations. Samples from the
                        scheduling block types config file and generate
                        datetimes and IDs. Replaces the observing schedule
                        file specified by --observing_schedule_path.
  --hardware_path HARDWARE_PATH
                        Path to JSON file containing hardware configuration.
  --hardware HARDWARE   Name of the hardware configuration to use.
  --scheduling_block_types_path SCHEDULING_BLOCK_TYPES_PATH
                        Path to JSON file containing scheduling block types
                        configuration.
  --pipelines_path PIPELINES_PATH
                        Path to JSON file containing pipelines configuration.
  --num_monte_carlo_iterations NUM_MONTE_CARLO_ITERATIONS
                        Number of Monte Carlo iterations to run.
  --num_workers NUM_WORKERS
                        Number of worker processes to use for running parallel
                        simulations. If `None` or `-1`, a suitable default
                        number of workers will be chosen. For a small number
                        of iterations (`n_iter < 500`), the simulations will
                        be run sequentially on a single worker. For larger
                        simulations, the default value is set to four times
                        the cpu count of the system at import time.
  --shuffle_observations
                        Shuffle the observations list before running the
                        simulation.
  --output_path OUTPUT_PATH
                        Path to output directory.
  --verbose             Print logging to console.
  --debug               Set logging level to DEBUG for detailed logging.

Using the API

The following code snippet demonstrates how to run the simulator in a Python script. This will output two pandas DataFrames containing resource usage and event logs. You can then use these DataFrames to generate plots or perform further analysis.

from ska_sdp_resource_model.simulate.resource_usage import ResourceUsageSimulator
from ska_sdp_resource_model.simulate.process_inputs import process_inputs

# Define a path to your observing schedule JSON file
path_to_observing_schedule = "data/schedules/observing_schedule.json"

# Process the inputs
observations_list, hardware_config_data = process_inputs(
    observing_schedule_path=path_to_observing_schedule
)

# Create an instance of the simulator
simulator = ResourceUsageSimulator()

# Run the simulation
output = simulator.run_simulation(observations_list, hardware_config_data["50_50"])

Using a Jupyter notebook

We have included example Jupyter notebooks in the notebooks directory of the repository.

  • 01-basic-simulation.ipynb demonstrates how to run the simulation for multiple hardware configurations and generate plots to compare the results.