Interpreting simulation outputs
===============================

The SDP resource model produces logs to standard output and two pandas dataframes. The logs are used
to monitor the progress of the simulation. The dataframes are used to analyse the results of the
simulation.

.. contents::
    :depth: 2
    :local:

Logs
----

Logs are written to standard output while the simulation is running and can be redirected to a file.
Each log message is prefixed with the simulation time, SBI_ID from the observing schedule file (see
:doc:`observing schedule <../inputs/configuration/observing_schedule>`) and, when running a pipeline
step, the pipeline step ID from the pipeline config (documented :doc:`here
<../inputs/configuration/pipeline_configuration>`). Messages include information about the
simulation progress, such as starting and completing an observation, scheduling block instance,
pipeline step, and batch processing as well as when storage and compute nodes are requested and
allocated. You will see messages like the following:

.. code-block:: none

    0: T001_001: Starting scheduling block instance...
    0: T001: Requesting 100 GB of capacity storage for raw visibilities...
    0: T001: 100 GB capacity storage allocated for raw visibilities.
    0: T001_001: Waiting for telescope...
    0: T001_001: Starting observation...
    3600: T001_001: Observation complete!
    3600: T001_001: Starting batch processing...
    3600: T001_001: Requesting 50 GB of performance storage for pre-processed visibilities...
    3600: T001_001: 50 GB performance storage allocated for pre-processed visibilities.
    3600: T001_001 - Step 1: Requesting 5 compute nodes...
    3600: T001_001 - Step 1: 5 compute nodes allocated.
    4320: T001_001 - Step 1: 5 compute nodes released.
    4320: T001_001 - Step 1: Pipeline completed!
    4320: T001_001: Batch processing complete!
    4320: T001: Retaining 106 GB of data for 24.0 hours.
    91440: T001: Deleted 106 GB of data from capacity storage.

Dataframes
----------

Two pandas DataFrames are produced by the SDP resource model. These are used to generate plots in
the web interface and can be used for further analysis using the API (see :doc:`API documentation
<../../api>`). They are also output to CSV files when running the simulation via the CLI.

event_log
~~~~~~~~~

This dataframe contains information about events that occurred during the simulation. Each row
represents a single event. Columns are as follows:

- ``batch_name (string)``: Scheduling block instance ID from the observing schedule.
- ``step (string)``: The name of the event.
- ``start (int)``: The simulation time (s) at which the event started.
- ``end (int)``: The simulation time (s) at which the event ended.

Events include:

- ``observing``: The observation time for the telescope.
- ``capacity_storage_wait_data_products``: The time spent waiting for capacity storage for data
  products.
- ``capacity_storage_wait_raw_visibilities``: The time spent waiting for capacity storage for raw
  visibilities.
- ``performance_storage_wait``: The time spent waiting for performance storage.
- ``{pipeline_name}_compute_wait``: The time spent waiting for compute nodes.
- ``{pipeline_name}_execution``: The time spent executing a pipeline step.
- ``{pipeline_name}_total``: The time spent on a pipeline step, including wait times.
- ``data_retention``: The time data is retained in capacity storage after batch processing.
- ``batch``: The total time spent on batch processing - this is used for the Gantt-style chart in
  the dashboard.

The following shows example rows you might see in the event log dataframe:

========== ======================== ===== =====
batch_name step                     start end
========== ======================== ===== =====
T001_001   observing                0     3600
T001_001   capacity_storage_wait    3600  4320
T001_001   performance_storage_wait 4320  91440
T001_001   Step 1                   4320  91440
========== ======================== ===== =====

resource_usage
~~~~~~~~~~~~~~

This dataframe contains information about the resources used during the simulation. Resource usage
is logged before and after every change (i.e. a resource is requested or released).

Each row represents time point in the simulation.

Columns are as follows:

- ``time_s (int)``: The simulation time in seconds (s).
- ``compute_nodes_in_use (int)``: The number of compute nodes in use.
- ``capacity_storage_in_use_gb (float)``: The amount of capacity storage in use (GB).
- ``performance_storage_in_use_gb (float)``: The amount of performance storage in use (GB).
- ``time_h (float)``: The simulation time in hours (h)
- ``time_d (float)``: The simulation time in days (d)

The following shows example rows you might see in the resource usage dataframe:

====== ==================== ========================== ============================= ====== ======
time_s compute_nodes_in_use capacity_storage_in_use_gb performance_storage_in_use_gb time_h time_d
====== ==================== ========================== ============================= ====== ======
0      0                    0.0                        0.0                           0.0    0.0
3600   5                    0.1                        0.05                          1.0    0.04
4320   0                    0.1                        0.05                          1.2    0.05
====== ==================== ========================== ============================= ====== ======