# Installation benchmon is mainly written in Python and BASH, and can be installed on Linux systems equipped with Python 3.6 or higher. To install benchmon, clone the repository and install it using pip. Using a Python virtual environment is recommended.

git clone https://gitlab.com/ska-telescope/sdp/ska-sdp-benchmark-monitor.git
cd ska-sdp-benchmark-monitor

# (Optional but recommended) Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate

pip install .

This installs ska-sdp-benchmark-monitor and adds the benchmon executables to the PATH.

Permissions for `perf`

benchmon uses perf for energy profiling and call-stack tracing. To use perf without root privileges, set the following parameters:

kernel.perf_event_paranoid = -1
kernel.kptr_restrict = 0

Installing `hpctoolkit`

to generate software performance metrics or collect execution traces, HPCToolkit is required. It can be installed using the following instructions

Check System availability

benchmon-check can be used to check the system availability and check if the environment is compatible for benchmon. We recommended that the users should run this command before running benchmon for the first time.

benchmon-check

Basic procedure

To monitor an application with benchmon, follow these four main steps:

Start benchmon using benchmon-start (or benchmon-multinode-start for multi-node setups). benchmon runs in the background and does not block subsequent executions. The type and frequency of monitoring can be configured by passing various arguments. A subdirectory is created to store all trace files.
Run the target applications as usual. benchmon performs all enabled types of monitoring in the background while the applications are running.
Stop benchmon with benchmon-stop (or benchmon-multinode-stop). This finalizes and terminates benchmon background processes, and post-processes the trace files. The traces are saved in the directory specified in step 1.
Visualize the trace files using benchmon-visu. This generates plots and figures for all collected metrics. Visualization can be customized by passing different arguments to select which metrics to display.

Example workflow with benchmon:

#!/usr/bin/bash

# Start benchmon (step 1)
benchmon-start --sys --pow --call --save-dir ./traces

# Target app (step 2)
./app_0
./app_1

# Stop benchmon (step 3)
benchmon-stop

# Visualize traces (step 4)
benchmon-visu --cpu --mem --recursive ./traces

The number of steps can be reduced to 3 by using the --level option. This option specifies a pre-defined set of monitoring options, which is useful for common benchmarking scenarios. See Pre-defined Benchmarking Levels for more details.

Monitoring options and flags

benchmon offers a set of options for customizing monitoring. Specific types of monitoring can be enabled or disabled, sampling frequencies adjusted, and system resources or metrics selected for tracking. This flexibility allows the monitoring process to be tailored to the application’s requirements and the desired level of detail for analysis.

`benchmon-start` and `benchmon-multinode-start`

General Options

-d, --save-dir: Directory to save traces (default: ./save_dir_<JobId>/).
-v, --verbose: Enable verbose output.
-b, --backend: Backend for multi-node monitoring (mpi, ssh; default: mpi).

Resource Usage

--system, --sys: Enable system monitoring.
--sys-freq: Monitoring frequency in Hz (default: 10).

Energy Consumption

--pow, --power: Enable power monitoring.
--pow-sampl-intv, --power-sampling-interval: Power sampling interval in ms (default: 250).
--pow-g5k, --power-g5k: Enable Grid5000 power monitoring.

Callgraph Tracing using `perf`

--call: Enable callstack tracing.
--call-mode: Callgraph collection mode (dwarf, lbr, fp; default: dwarf,32).
--call-prof-freq, --call-profiling-frequency: Profiling frequency in Hz (default: 10, min: 1).

Trace collection and generation of performance metrics using `hpctoolkit`

-e, --hpc-exe: executables to be traced.
-f, --hpc-flags: flags passed to hpcrun for configuring data collection.

`benchmon-stop` and `benchmon-multinode-stop`

These commands stop the monitoring process and post-process the trace files. For benchmon-multinode-stop, the flag -b | --backend can be used to specify the backend to spread the stop command to all nodes. Possible values are mpi and ssh (default: mpi).

`benchmon-visu`

The benchmon-visu tool provides detailed visualization of collected metrics, supporting both partial and comprehensive displays of resource usage, power consumption, and call tracing data. It enables selection of specific metrics for plotting, supports multi-node synchronized visualization, and offers options for interactive or figure generation. Output formats and figure quality can be customized for reporting or analysis.

For multi-node runs, benchmon-visu can generate synchronized graphs across nodes using the --recursive option. This feature aligns metrics from different nodes on a common timeline, making it easier to analyze distributed workloads and correlate events across the system.

benchmon-visu <traces-directory> [options]

If no directory is specified, the current directory (./) is used.

benchmon-visu also supports reading system metrics directly from InfluxDB v3 instead of local CSV/BIN traces. In this mode, the positional argument is still required, but it is used as the output directory for figures and logs rather than as an input trace directory.

InfluxDB-backed visualization

--influxdb: Enable direct visualization from InfluxDB.
--influxdb-url: InfluxDB v3 HTTP endpoint, for example http://localhost:8181.
--influxdb-token: InfluxDB token. Leave empty when the server is configured without authentication.
--influxdb-org: InfluxDB organization. Optional for the local benchmon stack.
--influxdb-database: Database/bucket name. Default: metrics.
--influxdb-hostname: Optional hostname filter. If omitted, benchmon-visu discovers all hostnames that have data in the selected time window.
--resolution: Query resolution. Supported values are auto, raw, 1s, 5s, 10s, 30s, 1m, 5m, 15m, and 1h.

If --start-time and --end-time are omitted, InfluxDB mode queries all data in the selected database.

When you run against the local Benchmon stack and expect large full-database scans, start the stack with benchmon-start-grafana --influxdb-query-file-limit 2000 (or another positive value) to increase the backend scan budget. This flag is passed to influxdb3 serve and sets the maximum number of parquet files that one query may scan, so changing it requires restarting the stack.

In InfluxDB mode, the supported system plots are --cpu, --cpu-all, --cpu-freq, --mem, --net, --disk, --ib, and --sys. --sys expands to --cpu --cpu-freq --mem --net --disk --ib.

The output style remains consistent with CSV/BIN visualization:

One figure set is generated per discovered host.
If the output root is ./benchmon_influx_figures, per-host figures are written under ./benchmon_influx_figures/benchmon_traces_<hostname>/.
--recursive additionally generates ./benchmon_influx_figures/multi-node_sync.<fmt> when more than one host is found, matching the legacy synchronized multi-node view.
Very large figure layouts are automatically split into __partNN pages.

Time filtering keeps the same interface as the CSV/BIN visualizer: --start-time and --end-time use the YYYY-MM-DDTHH:MM:SS format and are used directly as local wall-clock time on the machine running benchmon-visu. No manual UTC conversion is required.

If a full-database query is rejected by the InfluxDB backend because the request is too large, either restart the stack with a larger --influxdb-query-file-limit or rerun benchmon-visu with both --start-time and --end-time to narrow the query. Increasing --influxdb-query-file-limit affects query/read behavior only; it does not change CSV importer batching or write limits.

If some requested InfluxDB measurements are missing from the selected database, benchmon renders the plots backed by the tables that do exist and skips the unavailable plot types.

InfluxDB mode does not support --pow, --pow-g5k, --call, --inline-call, or --binary.

Example:

benchmon-visu ./benchmon_influx_figures \
  --influxdb \
  --influxdb-url http://localhost:8181 \
  --influxdb-database metrics \
  --start-time 2026-02-04T21:48:20 \
  --end-time 2026-02-04T22:03:20 \
  --sys \
  --recursive \
  --resolution auto \
  --fig-fmt png \
  --fig-name benchmon_influx_overview

Available options:

Resource usage

--mem: Visualize memory usage.
--cpu: Show average CPU usage (user, system, wait, idle, virt).
--cpu-all: Show all CPU cores usage.
--cpu-freq: Show CPU core frequencies.
--cpu-cores-full: Show core usage per space (comma-separated).
--cpu-cores-in: Include specific CPU cores (comma-separated).
--cpu-cores-out: Exclude specific CPU cores (comma-separated).
--net: Show network activity.
--net-all: Show all network interfaces.
--net-rx-only: Show only RX activity.
--net-tx-only: Show only TX activity.
--net-data: Label network plot with total data.
--disk: Show disk bandwidth.
--disk-iops: Show disk IOPS.
--disk-data: Label disk plots with total data.
--disk-rd-only: Show disk reads only.
--disk-wr-only: Show disk writes only.
--ib: Show Infiniband activity.

Energy Consumption

--pow: Visualize perf power profiles.
--pow-g5k: Visualize G5K power profiles.

Callgraph Tracing using `perf`

--call: Visualize call stack.
--call-depth: Set call stack depth (integer).
--call-cmd: (Optional) Set command to show in call stack plots.

Plot annotations

--inline-call: Annotate plots with running commands (if --call is enabled).
--inline-call-cmd: Comma-separated list of commands for inline call annotations.
--ical-log: Annotate with ICAL stages; the log file (wflow-selfcal.*.log) must be in the same directory as the traces.

Global visualization

--recursive: Generate synchronized plots for multi-node runs.
--interactive: Enable interactive mode (matplotlib).
--start-time: Set start time (YYYY-MM-DDTHH:MM:SS).
--end-time: Set end time (YYYY-MM-DDTHH:MM:SS).
--fig-path: Directory to save figures (default: traces directory).
--fig-fmt: Figure format (default: svg).
--fig-name: Figure name.
--fig-dpi: Figure quality (low, medium, high; default: medium).
--fig-width: Figure width in inches (default: 25.6).
--fig-height-unit: Figure subplot height in inches (default: 3).
--fig-xrange: Number of ticks on the x-axis (default: 25).
--fig-yrange: Number of ticks on the y-axis (default: 11). **

SW/HW contexts

benchmon can automatically generate detailed files describing both the software and hardware contexts of the current benchmark. The benchmon-hardware command produces a JSON file containing information such as CPU details, memory configuration, disk data, network interfaces, accelerator information, system topology, and operating system data. Similarly, the benchmon-software command generates a JSON file that includes environment variables, Spack dependencies, Python environment details, and loaded modules. Both commands accept the --save-dir option to specify the path where the JSON files will be generated. These JSON files can be visualized in graph mode using tools like JSON Crack for easier inspection and analysis.

Pre-defined benchmarking levels

benchmon provides pre-defined levels to simplify common benchmarking scenarios. Each level enables a specific set of monitoring and tracing options, as well as visualization options. When the --level flag is used, benchmon automatically configures metric collection, and at stop time (benchmon-stop), it calls benchmon-visu with the associated visualization options for that level. Software execution is traced either using perf or hpctoolkit for level 1 and above. perf will trace all software activity at a frequency increasing with the level. Conversely, hpctoolkit will only record information for the executables passed as arguments, performance metrics are produced for level 1 and execution traces for level 2. The save directory can also be specified with --save-dir to control where traces and figures are stored.

For each benchmarking level, benchmon-visu automatically generates two figures: an overview figure and a detailed figure, and both figures are produced in svg and png formats.

In addition, when using pre-defined benchmarking levels, benchmon always runs benchmon-software and benchmon-hardware to capture the software and hardware contexts of the benchmark.

Level	Monitoring options enabled	Visualization options enabled
`--level 0`	`--sys --sys-freq 1`	overview: `--cpu --mem --net --disk --fig-name benchmon_figure_overview`
		detailed: `--cpu --cpu-all --cpu-freq --mem --net --net-all --net-data --disk --disk-data --disk-iops --fig-name benchmon_figure_detailed`
`--level 1`	`--sys --sys-freq 5 --call --call-prof-freq 1` using `perf`	overview: `<level 0>` + `--inline-call`
		detailed: `<level 0>` + `--inline-call`
`--level 1`	`--sys --sys-freq 5` requires `--hpc-exe` to use `hpctoolkit`	overview: `<level 0>`
		detailed: `<level 0>`
`--level 2`	`--sys --sys-freq 100 --call --call-prof-freq 50` using `perf \| _overview:_+`--call --call-depth 4`
		detailed: `<level 1>` + `--call --call-depth 4`
`--level 2`	`--sys --sys-freq 100` requires `--hpc-exe` to use `hpctoolkit`	overview: `<level 1>`
		detailed: `<level 1>`

#!/usr/bin/bash
benchmon-start --level <level> --save-dir <dir>

<applications>

benchmon-stop --level <level> --save-dir <dir>

benchmon-report

The benchmon-report tool provides a way to automatically extract and summarize parts of the data from the raw output files generated by benchmon. It fills an Markdown file based on a pre-written template. The template can be adapted to the user’s needs.

This tool is able to extract and summarize data from the following benchmon outputs:

software report
hardware report
process timings report
monitoring graphs

The data is formatted to make it easily readable. To integrate these pieces of data in a Markdown file, you can add bracket-enclosed labels wherever you want the data to be written like in this example:

# Hardware description
- Partition name: <partition_name>
- Compute nodes:
    - CPUs:
        - model name: <CPU_Model>
        - number of cores: <Cores_per_socket>
        - threads per core: <Threads_per_core>
        - sockets and NUMA organisation: <Sockets> socket(s), <NUMA_nodes> NUMA nodes
        - min frequency: <CPU_Min_Speed_MHz> MHz
        - max frequency: <CPU_Max_Speed_MHz> MHz
        - L1d cache: <L1d_cache> per socket
        - L1i cache: <L1i_cache> per socket
        - L2 cache: <L2_cache> per socket
        - L3 cache: <L3_cache> per socket
    - Memory:
        - RAM: <ram_gib> GiB (<ram_per_core_gib> GiB per core)
        - Swap: <swap_gib> GiB

# Software description
## Environment variables
<environment_variables>

## Spack environment
<spack_dependencies>

## Python environment
<python_environment>

# Pipeline performance
## Process timings
<ps_data>

## Resource usage
<benchmon_plot>

You can also use the default template provided in benchmon/report/template.md. Then, run the following command:

benchmon-report --software-report <path/to/swmon/file> --hardware-report <path/to/hwmon/file> --ps-report <path/to/ps/report/file> --figure-path <path/to/graph/file.png>

Permissions for perf

Installing hpctoolkit

Check System availability

Basic procedure

Monitoring options and flags

benchmon-start and benchmon-multinode-start

General Options

Resource Usage

Energy Consumption

Callgraph Tracing using perf

Trace collection and generation of performance metrics using hpctoolkit

benchmon-stop and benchmon-multinode-stop

benchmon-visu

InfluxDB-backed visualization

Resource usage

Energy Consumption

Callgraph Tracing using perf

Plot annotations

Global visualization

SW/HW contexts

Pre-defined benchmarking levels

benchmon-report

Permissions for `perf`

Installing `hpctoolkit`

`benchmon-start` and `benchmon-multinode-start`

Callgraph Tracing using `perf`

Trace collection and generation of performance metrics using `hpctoolkit`

`benchmon-stop` and `benchmon-multinode-stop`

`benchmon-visu`

Callgraph Tracing using `perf`