Usage

The main script to run the benchmark suite is sdpbenchrun.py that can be found in ska-sdp-benchmark-suite/scripts. The suite can be launched with a default configuration using following command:

python3 sdpbenchrun.py

This will run the default config file.

To get all the available options, we should run python3 sdpbenchrun.py --help. This gives you following output:

usage: sdpbenchrun.py [-h] [-r [RUN_NAME]] [-b BENCHMARKS [BENCHMARKS ...]]
                    [-c [CONFIG]] [-m RUN_MODE [RUN_MODE ...]]
                    [-n NUM_NODES [NUM_NODES ...]] [-d [WORK_DIR]]
                    [-t [SCRATCH_DIR]] [-s] [-p [REPETITIONS]] [-o] [-l]
                    [-e] [-a] [-v] [--version]

SKA SDP Pipeline Benchmark Tests

optional arguments:
-h, --help            show this help message and exit
-r [RUN_NAME], --run_name [RUN_NAME]
Name of the run
-b BENCHMARKS [BENCHMARKS ...], --benchmarks BENCHMARKS [BENCHMARKS ...]
List of benchmarks to run.
-c [CONFIG], --config [CONFIG]
Configuration file to use. If not provided, load
default config file.
-m RUN_MODE [RUN_MODE ...], --run_mode RUN_MODE [RUN_MODE ...]
Run benchmarks in container and/or bare metal.
-n NUM_NODES [NUM_NODES ...], --num_nodes NUM_NODES [NUM_NODES ...]
Number of nodes to run the benchmarks.
-d [WORK_DIR], --work_dir [WORK_DIR]
Directory where benchmarks will be run.
-t [SCRATCH_DIR], --scratch_dir [SCRATCH_DIR]
Directory where benchmark output files will be stored.
-s, --submit_job      Run the benchmark suite in job submission mode.
-p [REPETITIONS], --repetitions [REPETITIONS]
Number of repetitions of benchmark runs.
-o, --randomise       Randomise the order of runs.
-l, --ignore_rem      In case of unfinished jobs, skip remaining jobs on
relaunch.
-e, --export          Export all stdout, json and log files from run_dir and
compresses them.
-a, --show            Show running config and exit.
-v, --verbose         Enable verbose mode. Display debug messages.
--version             show program's version number and exit

By default all the CLI arguments can be provided in the configuration file as well. However, the options passed by CLI will override the options given in the configuration file.

Arguments

If not --run_name argument is provided, by default, benchmark suite creates a name using hostname and current timestamp. It is advised to give a run name to better organise different benchmark runs. User-defined configuration files can be provided using --config option. Note that system specific user configuration should be placed in the home directory of the user as suggested in Configuration Files. The benchmark suite supports running in the batch submission mode and it can be invoked by --submit_job option. All the benchmark related files are placed in the directory specified to --work_dir option. This includes source codes, compilation files, std out files, etc. Currently only imaging IO test is included in the benchmark suite and depending on the configuration used, it will generate huge amount of data. --scratch_dir can be used to write this kind of data. Important that users must have permissions to read and write to both --work_dir and --scratch_dir. It is also important to note that fastest file system available to the machine should be used as --scratch_dir as we are interested in high I/O bandwidths. --repetitions specify number of times each experiment is repeated. Similarly, using --randomise option randomises the order of experiments. This helps to minimise the cache effects.

Example use cases

To run with user defined config path/to/my_config.yml and giving a run name trial_run

python3 sdpbenchrun.py -r trial_run -c path/to/my_config.yml

To override run mode in the default config file and run with both bare-metal and container modes

python3 sdpbenchrun.py -m bare-metal singularity

Multiple inputs to an argument must be delimited by the space.

To run the benchmark with 2, 4 and 8 nodes

python3 sdpbenchrun.py -n 2 4 8

Note that in such case it is better to use batch submission mode with -s option. Else, we will have to make a reservation for 8 nodes and for certain runs, we will use only 2 or 4 nodes and rest of the nodes will be ideal. When -s option is added, the benchmark suite will essentially submit jobs to scheduler based on the configuration provided in the system config file. Currently, only SLURM and OAR batch schedulers are supported.

To export the compressed files of the results, use

python3 sdpbenchrun.py -e

In the similar way, multiple parameters can be defined for the benchmark configuration. More details on how to provide them is documented in the default configuration file.

Based on the provided configuration files and CLI arguments, a parameter space is created and benchmark runs with different combinations in the parameter space are realised.

Job submission mode

By default the benchmark suite runs in interactive mode, which means the benchmarks are run on the current node. It can be run using SLURM too as follows:

#!/bin/bash

#SBATCH --time=01:00:00
#SBATCH -J sdp-benchmarks
#SBATCH --no-requeue
#SBATCH --exclusive

module load gnu7 openmpi3 hdf5

workdir=$PWD

# Make sure we have the right working directory
cd $workdir

echo -e "JobID: $SLURM_JOB_ID\n======"
echo "Time: `date`"
echo "Running on master node: `hostname`"
echo "Current directory: `pwd`"
echo "Output directory: $outdir"
echo -e "\nnumtasks=$SLURM_NTASKS, numnodes=$SLURM_JOB_NUM_NODES, OMP_NUM_THREADS=$OMP_NUM_THREADS"

CMD="python3 sdpbenchrun.py -n $SLURM_JOB_NUM_NODES"

echo -e "\nExecuting command:\n==================\n$CMD\n"
eval $CMD

echo -e "\n=================="
echo "Finish time: `date`"

This script will run the benchmark suite with default configuration and number of nodes in the SLURM reservation. Sometimes, it is not convenient to run the benchmarks in this way. For example, if we want to do scalability tests, we need to run benchmark with different number of nodes. In this case, one option is to submit several SLURM jobs for different number of nodes. However, this can be done even more in a streamlined fashion. Lets say we want to run on 2, 4, 8 ,16 and 32 nodes. By invoking job submission mode, we can simply use

python3 sdpbenchrun.py -r scalability_test -n 2 4 8 16 32 -s

on the login node. This will submit SLURM job scripts and we are naming our test run as scalability_run. Once all the jobs are finished, if we we rerun the same command again, the benchmark suite will parse the output from the benchmarks and save all the data. In case if not all the jobs have finished, the benchmark suite will collect the results from the jobs that are finished and show the state of unfinished jobs.

In case for some reason one of the jobs failed to finish successfully, the benchmark suite will mark this test as fail. If we run the benchmark suite with the same configuration, the benchmark has already information on which tests have successfully finished and which tests have failed. Eventually, it will run only the tests that have failed. This works both in interactive and job submission mode.

Collection of results

The test results are saved to the --work-dir in JSON format. Typically, for iotest, the results can be found at $WORK_DIR/iotest/out. There will be two sub-directories under this directory named std-out and json-out. The std-out contains the std output from the benchmark runs, whereas json-out will have all the information about the benchmark run and relevant metrics parsed from the std out files.

Typical schema of the JSON output file is

{
  "benchmark_name": <name of the benchmark test>,
  "benchmark_args": {<arguments of the benchmark test>},
  "batch_script_info": {<info about batch scheduler>},
  "benchmark_metrics_info": {<metrics parsed from benchmark test>}
}

All the meta data about the benchmark experiment can be found in the JSON file. It is possible to reproduce the experiment with the data available in the JSON file.