Level 0 Benchmark Tests

CPU tests

HPCG benchmark

Context

This is HPCG microbenchmark test with a default problem size of 104. The benchmark does several matrix vector operations on sparse matrices. More details about the benchmark can be found at HPCG benchmark website.

Note

Currently, the implemented test uses the optimized version of benchmark shipped by Intel MKL library. We use GNU compiler toolchain for running the test on AMD processors. For the case of IBM POWER processors, IBM shipped XL compiler toolchain is used to run the benchmark

Test size

Currently, two different variables can be controlled for running tests. They are

  • number of nodes to run the benchmark

  • problem size

By default, benchmark will run on single node. If the user wants to run on multiple nodes, we can set the num_nodes variable at the run time. Similarly, the default problem size is 104 and it can be set at runtime using problem_size variable.

Note

Even if more than one node is used in the test, the resulting performance metric, Gflop/s, is always reported per node.

Note

The problem size must be a multiple of 8. This is the requirement from the HPCG benchmark per se.

Test types

Currently there are three different types of tests implemented:

  • HpcgXlTest: HPCG with IBM XL toolchain

  • HpcgGnuTest`: HPCG with GNU GCC toolchain

  • HpcgMklTest: HPCG shipped with Intel MKL package

If a system has two valid tests, we can restrict the test using -n flag on the CLI. It is shown in Usage.

Usage

The test can be run using following commands.

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/cpu/hpcg/reframe_hpcg.py --run --performance-report

We can set number of nodes on the CLI using:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/cpu/hpcg/reframe_hpcg.py --run --performance-report -S num_nodes=2

Similarly, problem size of the benchmark can be altered at runtime using:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/cpu/hpcg/reframe_hpcg.py --run --performance-report -S problem_size=120

For instance, if a system has both HpcgGnuTest and HpcgMklTest as valid tests and if we want to run only HpcgMklTest, we can use -n flag as follows:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/cpu/hpcg/reframe_hpcg.py --run --performance-report -n HpcgMklTest

Test class documentation

class apps.level0.cpu.hpcg.reframe_hpcg.HpcgMixin(*args, **kwargs)[source]

Common regression test attributes for HpcgGnuTest and HpcgMklTest

test_settings()[source]

Common test settings

set_git_commit_tag()[source]

Fetch git commit hash

set_executable_opts()[source]

Set executable options

export_env_vars()[source]

Export env variables using OMPI_MCA param for OpenMPI

set_tags()[source]

Add tags to the test

set_keep_files()[source]

List of files to keep in output

set_sanity_patterns()[source]

Set sanity patterns. Example stdout:

extract_gflops()[source]

Performance extraction function for Gflops

set_perf_patterns()[source]

Set performance variables

class apps.level0.cpu.hpcg.reframe_hpcg.HpcgXlTest(*args, **kwargs)[source]

Main class of HPCG test based on IBM Xl

build_executable()[source]

Set build system and config options

set_num_tasks_threads()[source]

Set number of MPI and OpenMP tasks

set_executable()[source]

Set executable

set_outfile()[source]

Set name of output file

job_launcher_opts()[source]

Set job launcher options

class apps.level0.cpu.hpcg.reframe_hpcg.HpcgGnuTest(*args, **kwargs)[source]

Main class of HPCG test based on GNU

set_num_tasks()[source]

Set number of tasks for job

set_prebuild_cmds()[source]

Copy Make file into setup folder

build_executable()[source]

Set build system and options

set_executable()[source]

Set executable

set_outfile()[source]

Set name of output file

job_launcher_opts()[source]

Set job launcher options

class apps.level0.cpu.hpcg.reframe_hpcg.HpcgMklTest(*args, **kwargs)[source]

Main class of HPCG test based on MKL

set_num_tasks()[source]

Set number of tasks for job

set_env_vars()[source]

Set job specific env variables

set_executable()[source]

Set executable

set_executable_opts()[source]

Override executable options from Mixin class

set_output_file()[source]

Set output file name

set_mkl_env()[source]

Source the env vars to get all necessary libraries on PATH

HPL benchmark

Context

This is HPL microbenchmark test with using a single node as default test parameter. It is used as reference benchmark to provide data for the Top500 list and thus rank to supercomputers worldwide. HPL rely on an efficient implementation of the Basic Linear Algebra Subprograms (BLAS).

Note

Currently, the implemented test uses the optimized version of benchmark shipped by Intel MKL library.

Test types

Currently, two different tests are defined namely,

  • HplGnuTest: Based on GNU toolchain for non Intel processors

  • HplMklTest: Using benchmark shipped out of Intel MKL library for Intel processors

On Intel chips, we can use the precompiled binary that comes out-of-the-box from Intel MKL library. But for non Intel systems, we need to compile using GNU tool chain with customized make file. In the directory makes/, we provide the make file for AMD chips using BLIS as BLAS library. We can choose which test to run during runtime using CLI which is discussed in Usage.

Test configuration file

The prerequisite to run HPL benchmark is HPL.dat file that contains several benchmark parameters. A sample configuration file looks like

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
4            # of problems sizes (N)
29 30 34 35  Ns
4            # of NBs
1 2 3 4      NBs
0            PMAP process mapping (0=Row-,1=Column-major)
3            # of process grids (P x Q)
2 1 4        Ps
2 4 1        Qs
16.0         threshold
3            # of panel fact
0 1 2        PFACTs (0=left, 1=Crout, 2=Right)
2            # of recursive stopping criterium
2 4          NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
3            # of recursive panel fact.
0 1 2        RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
0            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
0            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

More details on each parameter can be found in the tuning section of benchmark documentation. This link can be used to generate a HPL.dat file for a given runtime configuration. Another useful link in this context is here.

Currently, the test supports automatic generation of HPL.dat file based on the system configuration. The class GenerateHplConfig in modules.utils is used for this purpose. The problem size of HPL is dependent on the available system memory and it is generally recommended to set a size that occupies at least 80% of the system memory. It means for systems that have big memory, a very huge problem size can be generated which can take a very long for the benchmark to run. Thus, we capped the system memory to 200 GB to avoid these very long run times.

For the intel processors, we use one MPI process per node but for AMD chips, we use number of L3 caches as number of MPI processes and use number of cores attached to each L3 cache as number of OpenMP threads.

Usage

The test can be run using following commands.

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/cpu/hpl/reframe_hpl.py --run --performance-report

We can set number of nodes on the CLI using:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/cpu/hpl/reframe_hpl.py --run --performance-report -S num_nodes=2

To choose a particular test during runtime using -n option as follows:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/cpu/hpl/reframe_hpl.py --run --performance-report -n HplGnuTest

Test class documentation

class apps.level0.cpu.hpl.reframe_hpl.HplMixin(*args, **kwargs)[source]

Common methods and attributes for HPL main tests

test_settings()[source]

Common test settings

set_git_commit_tag()[source]

Fetch git commit hash

export_env_vars()[source]

Export env variables using OMPI_MCA param for OpenMPI

generate_config()[source]

Generate HPL config file and place it in stagedir

set_tags()[source]

Add tags to the test

set_sanity_patterns()[source]

Set sanity patterns. Example stdout:

# Finished      1 tests with the following results:
#               1 tests completed and passed residual checks,
#               0 tests completed and failed residual checks,
#               0 tests skipped because of illegal input values
# --------------------------------------------------------------------------------
# End of Tests.
extract_gflops()[source]

Performance extraction function for Gflops. Sample stdout:

set_perf_patterns()[source]

Set performance variables

set_reference_values()[source]

Set reference perf variables

class apps.level0.cpu.hpl.reframe_hpl.HplGnuTest(*args, **kwargs)[source]

Main class of HPL test based on GNU toolchain

set_num_tasks()[source]

Set number of processes and threads based on L3 cache

set_job_env_vars()[source]

Set job specific env variables

download_hpl()[source]

Download HPL 2.3 source code

set_topdir_makefile()[source]

Set TOPDIR var in Makefile and copy to stagedir

emit_prebuild_cmds()[source]

Make clean if already exists

build_executable()[source]

Set build system and config options

set_executable()[source]

Set executable name

job_launcher_opts()[source]

Set job launcher options

class apps.level0.cpu.hpl.reframe_hpl.HplMklTest(*args, **kwargs)[source]

Main class of HPL test based on MKL

set_num_tasks()[source]

Set number of tasks for job

set_omp_threads()[source]

Set number of OpenMP threads

set_executable()[source]

Set executable name

Intel MPI Benchmarks

Context

Intel MPI Benchmarks (IMB) are used to measure application-level latency and bandwidth, particularly over a high-speed interconnect, associated with a wide variety of MPI communication patterns with respect to message size.

Note

Currently, only benchmarks from IMB-MPI1 components are included in the test.

Included benchmarks

Currently, the test includes following benchmarks:

  • Pingpong

  • Uniband

  • Biband

  • Sendrecv

  • Allreduce

  • Alltoall

  • Allgather

By default all the above listed benchmarks will be run by the test. However, the user can choose subset of these benchmarks at the runtime using CLI. This will be discussed in Usage.

Number of MPI processes

By default benchmarks like Uniband, Biband, etc, are run with MPI processes varying from 2, 4, 8 and so on until the number of physical cores on the nodes. In order to reduce total number of benchmarks, only two runs for each benchmark is chosen

  • Run with 1 MPI process per node

  • Run with N MPI processes per node where N is number of physical cores.

Effectively using this configuration, we are running test that establish upper and lower bounds of benchmark metrics and thus minimising the time required for benchmarks to run.

Test configuration

The only file that is needed for the test to run is placed in src/ folder which provides the list of message sizes to be tested in the benchmark. The file must be as follows:

0
4096
16384
131072
1048576
4194304

If we want to test for more message sizes, simply add new lines in the file and place it in src/ folder.

Test variables

Different variables are available for the user to change the runtime configuration of the tests. They are listed as follows:

  • variants: Benchmark variants that can be chosen as listed in Included benchmarks (default: All benchmarks listed in Included benchmarks).

  • mem: Memory allocated per MPI process (default: 1).

  • timeout: Timeout for running the benchmark for each message size (default: 2).

The variables mem and timeout are specific to IMB and more details about these variables can be found in the documentation.

Tip

For benchmarks like Alltoall, Allgather, Allreduce involving many nodes, runs with bigger message sizes might timeout. In this case increase the timeout variable. Similarly, nodes with many cores and smaller memory size can pose problems when running benchmarks. As stated in Number of MPI processes, as many MPI processes as number of physical cores are used for benchmark runs. So, if the node has N physical cores and less than N GB of DRAM, benchmarks will fail due to lack of sufficient memory. In this case reduce the mem variable which reduces the memory allocated for each MPI process.

All these variables can be configured at the runtime from the CLI using -S flag of ReFrame. It will be discussed in Usage.

Test parameterisation

The tests are parameterised based on the variable tot_nodes. The current default value is 2. This variable can only be configured from CLI using environment variable IMBTEST_NODES. Based on the tot_nodes value, the closest power of 2 is estimated and parameterised tests on different number of nodes generated in the powers of 2 are used. For instance, if tot_nodes is 64, tests are run on 2, 4, 8, 16, 32 and 64 nodes. For each run, all the requested benchmarks will be executed with different number of MPI processes as described in Number of MPI processes.

If the user wants to restrict number of nodes to only few runs, we can do it using -t flag on the CLI.

Usage

The test can be run using following commands.

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/cpu/imb/reframe_imb.py --exec-policy=serial --run --performance-report

Important

It is absolutely necessary to use --exec-policy=serial option while running these benchmarks. By default ReFrame will execute tests in asynchronous mode, where all tests are executed at the same time. As we are interested in network latency and bandwidth metrics, it is advised to run these benchmarks serially so that they do not interfere with each other.

We can choose benchmark variants from CLI. For example, if we want to run only Uniband and Biband benchmarks:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/cpu/imb/reframe_imb.py --exec-policy=serial --run --performance-report -S variants="Uniband","Biband"

Similarly other variables can also be configured from CLI. To use mem as 0.5 and timeout as 3.0:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/cpu/imb/reframe_imb.py --exec-policy=serial --run --performance-report -S mem=0.5 -S timeout=3.0

To set the total number of nodes from CLI using IMBTEST_NODES environment variable, use following:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
IMBTEST_NODES=16 reframe/bin/reframe -C reframe_config.py -c apps/level0/cpu/imb/reframe_imb.py --exec-policy=serial --run --performance-report

Finally, to select only few parameterised tests, we can use -t flag. For example, if tot_nodes is set to 16 and if we want to run only tests where number of nodes are 8 and 16, we can do following:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
IMBTEST_NODES=16 reframe/bin/reframe -C reframe_config.py -c apps/level0/cpu/imb/reframe_imb.py --exec-policy=serial --run --performance-report -t 8$ -t 16$

All the above mentioned CLI flags can be used together without any side effects.

Test class documentation

class apps.level0.cpu.imb.reframe_imb.ImbMixin(*args, **kwargs)[source]

Common test attributes for IMB test

set_git_commit_tag()[source]

Fetch git commit hash

get_l3_cache()[source]

Get L3 cache size in MB and line size

set_num_tasks()[source]

Set number of tasks for job

get_msg_lens()[source]

Read input message lengths

set_executable()[source]

Set executable name

export_env_vars()[source]

Export env variables using OMPI_MCA param for OpenMPI

set_executable_opts()[source]

Set executable options

add_launcher_options()[source]

Add job launcher commands

set_tags()[source]

Add tags to the test

set_sanity_patterns()[source]

Set sanity patterns. Example stdout:

parse_stdout(msg_len, var, ind)[source]

Read stdout file to parse perf variables

extract_bw(msg_len=0, var='PingPong', ind=3)[source]

Performance extraction function for bandwidth

extract_time(msg_len=0, var='PingPong', ind=2)[source]

Performance extraction function for latency

set_perf_patterns()[source]

Set performance variables. Sample stdout

# e.g.
#---------------------------------------------------
# Benchmarking Uniband
# #processes = 4
#---------------------------------------------------
       #bytes #repetitions   Mbytes/sec      Msg/sec
            0         1000         0.00      9233819
set_reference_values()[source]

Set reference perf variables

class apps.level0.cpu.imb.reframe_imb.ImbPingpongTest(*args, **kwargs)[source]

Main class of IMB Pingpong test

set_executable_opts()[source]

Set executable options

set_sanity_patterns()[source]

Set sanity patterns. We override the method in Mixin Class. Example stdout:

set_perf_patterns()[source]

Set performance variables. We override the method in Mixin Class. Sample stdout

# e.g.
# #---------------------------------------------------
# # Benchmarking PingPong
# # #processes = 2
# #---------------------------------------------------
#        #bytes #repetitions      t[usec]   Mbytes/sec
#             0         1000         3.51         0.00
# #---------------------------------------------------
class apps.level0.cpu.imb.reframe_imb.ImbOneCoreTests(*args, **kwargs)[source]

Main class of all IMB variants tests using one core per node

class apps.level0.cpu.imb.reframe_imb.ImbAllCoreTests(*args, **kwargs)[source]

Main class of IMB variants tests using all cores per node

set_num_tasks()[source]

Set number of tasks for job

IOR benchmark

Context

IOR is designed to measure parallel file system I/O performance through a variety of potential APIs. This parallel program performs writes and reads to/from files and reports the resulting throughput rates. The tests are configured in such a way to minimise the page caching effect on I/O bandwidth. See here for more details.

Note

In order to run this test, an environment variable SCRATCH_DIR must be defined in the system partition with the path to the scratch directory of the platform. Otherwise the test will fail.

Test variables

Several variable are defined in the tests which can be configured from the command line interface (CLI). They are summarised as follows:

  • num_nodes: Number of nodes to run the test (default is 4)

  • num_mpi_tasks_per_node: Number of MPI processes per node (default is 8)

  • block_size: Block size of IOR test (default is 1g)

  • transfer_size: Transfer size of IOR test (default is 1m)

  • num_segments: Number of segments of IOR test (default is 1)

The variables block_size, transfer_size and num_segments are IOR related. More details on these variables can be found at IOR documentation.

Any of these variables can be overridden from the CLI using -S option of ReFrame. The examples are presented in Usage.

Test parameterisation

The test is parameterised with respect to two parameters namely I/O interface and file type. There are 3 different I/O interfaces available

  • posix: POSIX I/O

  • mpiio: MPI I/O

  • hdf5: HDF5

We can write data to a single file or use file-per-process approach and tests are parameterised as follows:

  • single: Single file for all processes

  • fpp: File per process

The parameterised tests can be controlled by tags which will be shown in the Usage section.

Usage

The test can be run using following commands.

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/cpu/ior/reframe_ior.py --exec-policy=serial --run --performance-report

Note

It is extremely important to use --exec-policy=serial for this particular test. By default, ReFrame executes the tests in asynchronous mode which means multiple jobs are executed at the same time if partition allows to do so. However, for this type of IO test, we do not want all the jobs using the underlying file system at the same time. So, we switch to serial execution where only one job at a time is executed on the partition.

To configure the test variables presented in Test variables section we can use -S option as follows:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/cpu/ior/reframe_ior.py --exec-policy=serial --run --performance-report -S num_nodes=2

Multiple variables can be configured simple by repeating -S flag for each variable as follows:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/cpu/ior/reframe_ior.py --exec-policy=serial --run --performance-report -S num_nodes=2 -S block_size=10g

By default all parameterised tests will be executed for a given partition. The list of tests can be obtained using:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/cpu/ior/reframe_ior.py -l

which will give following output :

[ReFrame Setup]
version:           3.9.0-dev.3+adca255d
command:           'reframe/bin/reframe -C reframe_config.py -c apps/level0/cpu/ior/reframe_ior.py -l'
launched by:       mahendra@alaska-login-0.novalocal
working directory: '/home/mahendra/work/ska-sdp-benchmark-tests'
settings file:     'reframe_config.py'
check search path: (R) '/home/mahendra/work/ska-sdp-benchmark-tests/apps/level0/cpu/ior/reframe_ior.py'
stage directory:   '/home/mahendra/work/ska-sdp-benchmark-tests/stage'
output directory:  '/home/mahendra/work/ska-sdp-benchmark-tests/output'

[List of matched checks]
- IorTest_hdf5_single (found in '/home/mahendra/work/ska-sdp-benchmark-tests/apps/level0/cpu/ior/reframe_ior.py')
- IorTest_posix_single (found in '/home/mahendra/work/ska-sdp-benchmark-tests/apps/level0/cpu/ior/reframe_ior.py')
- IorTest_mpiio_single (found in '/home/mahendra/work/ska-sdp-benchmark-tests/apps/level0/cpu/ior/reframe_ior.py')
- IorTest_posix_fpp (found in '/home/mahendra/work/ska-sdp-benchmark-tests/apps/level0/cpu/ior/reframe_ior.py')
- IorTest_mpiio_fpp (found in '/home/mahendra/work/ska-sdp-benchmark-tests/apps/level0/cpu/ior/reframe_ior.py')
- IorTest_hdf5_fpp (found in '/home/mahendra/work/ska-sdp-benchmark-tests/apps/level0/cpu/ior/reframe_ior.py')
Found 6 check(s)

Log file(s) saved in '/home/mahendra/work/ska-sdp-benchmark-tests/reframe.log', '/home/mahendra/work/ska-sdp-benchmark-tests/reframe.out'

As we can see from the output, ReFrame will execute all IO types and file type tests. In order to choose only few parameterised tests, we can use -t flag to restrict the tests to given parameters. For example, to run only POSIX IO interface and Single file variant

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/cpu/ior/reframe_ior.py --exec-policy=serial --run --performance-report -t posix$ -t single$

can be used. As in the case of -S option, -t can also be repeated as many times as user want.

Test class documentation

class apps.level0.cpu.ior.reframe_ior.IorTest(*args, **kwargs)[source]

Main class of IOR read and write tests

set_param_tags()[source]

Add parameter tags to the test

set_git_commit_tag()[source]

Fetch git commit hash

set_num_tasks_reservation()[source]

Set number of tasks for job reservation

set_num_mpi_tasks()[source]

Set number of MPI tasks

set_tags()[source]

Add tags to the test

patch_job_launcher()[source]

Monkey mock the job launcher command

set_executable()[source]

Set executable name

set_executable_opts()[source]

Set executable options

export_env_vars()[source]

Export env variables using OMPI_MCA param for OpenMPI

chdir_to_scratch()[source]

Add prerun command to change PWD to scratch dir before commencing test

job_launcher_opts()[source]

Set job launcher options

set_sanity_patterns()[source]

Set sanity patterns. Example stdout

# Max Write: 940.74 MiB/sec (986.44 MB/sec)
# Max Read:  1303.68 MiB/sec (1367.01 MB/sec)
# Finished            : Mon Oct 18 10:52:25 2021
extract_write_bw()[source]

Performance extraction function for extract write bandwidth. Sample stdout

# Max Write: 940.74 MiB/sec (986.44 MB/sec)
extract_read_bw()[source]

Performance extraction function for extract read bandwidth. Sample stdout

# Max Read:  1303.68 MiB/sec (1367.01 MB/sec)
set_perf_patterns()[source]

Set performance variables

set_reference_values()[source]

Set reference perf variables

STREAM benchmark

Context

STREAM is used to measure the sustainable memory bandwidth of high performance computers. The source code is available here.

Note

Currently, the implemented test uses only Intel compiler that is optimized for Intel processors. A generic GNU compiled stream test will be added in the future.

Test configuration

STREAM benchmark uses 3 arrays of size N to perform different kernels. The most relevant and interesting kernel is “Triad” kernel. In the test we use the size of the arrays in such a way that they occupy 60 % of the system memory. In this way, we are sure that caching effects are avoided while running the benchmark.

The Makefile in the src/ folder contains all the optimized compiler flags used for Intel compiler to extract maximum peformance.

Usage

The test can be run using following commands.

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/cpu/stream/reframe_stream.py --run --performance-report

Test class documentation

class apps.level0.cpu.stream.reframe_stream.StreamTest(*args, **kwargs)[source]

Main class of Stream test based on Intel compiler

set_git_commit_tag()[source]

Fetch git commit hash

set_num_tasks_reservation()[source]

Set number of tasks for job reservation

set_env_vars()[source]

Set OpenMP environment variables

get_array_size()[source]

Set array size to be 60% of main memory

set_tags()[source]

Add tags to the test

set_launcher()[source]

Set launcher to local to avoid appending mpirun or srun

build_executable()[source]

Set build system and config options

export_env_vars()[source]

Export env variables using OMPI_MCA param for OpenMPI

set_executable()[source]

Set executable

set_sanity_patterns()[source]

Set sanity patterns. Example stdout:

# -------------------------------------------------------------
# Solution Validates: avg error less than 1.000000e-13 on all three arrays
# -------------------------------------------------------------
extract_bw(kind='Copy')[source]

Performance function to extract bandwidth. Sample stdout:

# Function    Best Rate MB/s  Avg time     Min time     Max time
# Copy:           42037.6     0.003859     0.003806     0.004004
# Scale:          41047.7     0.003917     0.003898     0.003942
# Add:            45138.5     0.005347     0.005317     0.005372
# Triad:          46412.1     0.005202     0.005171     0.005238
set_perf_patterns()[source]

Set performance variables

set_reference_values()[source]

Set reference perf values

GPU tests

Babel Stream benchmark

Context

Babel Stream is inspired from STREAM benchmark to measure the memory bandwidth on GPUs. It supports several other programming models for CPUs as well. More details can be found in the documentation.

Note

Although the benchmark supports various programming models, currently the test uses only OMP, TBB and CUDA models.

Test variants

Currently, three different variants of the benchmark are included in the test. They are

  • omp: Using OpenMP threading model

  • tbb: Using Intel’s TBB model

  • cuda: Using CUDA model for GPUs

The test is parameterised for these models and a specific test can be chosen at the runtime using -t flag on CLI. An example is shown in the Usage.

Test configuration

Like STREAM benchmark, Babel stream uses 3 arrays of size N for different kernels. The size of the arrays that will be used in the benchmark kernels can be configured at the run time using mem_size variable. Currently, the default value for mem_size is 0.4, which means the array size is chosen in such a way that all three arrays will occupy 40 % of total memory available.

Note

Depending on the GPU, sometimes we might get an error saying not enough space availble to store buffers. Decrease the mem_size in that case to allocate smaller arrays.

Usage

The test can be run using following commands.

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/gpu/babel_stream/reframe_babelstream.py --run --performance-report

To run only omp variant and skip rest of the models, use -t flag as follows:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/gpu/babel_stream/reframe_babelstream.py -t omp$ --run --performance-report

To change the default value of mem_size during runtime, use -S flag. For example to use 30% of total memory:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/gpu/babel_stream/reframe_babelstream.py -S mem_size=0.3 --run --performance-report

Test class documentation

class apps.level0.gpu.babel_stream.reframe_babelstream.BabelStreamTest(*args, **kwargs)[source]

Babel stream test main class

set_num_tasks()[source]

Set number of tasks for job

set_array_size()[source]

Set array size to be a certain percentage of main memory

set_launcher()[source]

Set launcher to local to avoid appending mpirun or srun

build_executable()[source]

Set build system and config options

export_env_vars()[source]

Export env variables using OMPI_MCA param for OpenMPI

set_my_tags()[source]

Add tags to the test

set_executable()[source]

Set name of executable and runtime options

set_sanity_patterns()[source]

Set sanity patterns. Example stdout:

# BabelStream
# Version: 3.4
# Implementation: OpenMP
# Running kernels 100 times
# Precision: double
# Array size: 268.4 MB (=0.3 GB)
# Total size: 805.3 MB (=0.8 GB)
# Function    MBytes/sec  Min (sec)   Max         Average
# Copy        69666.441   0.00771     0.01172     0.00783
# Mul         67689.368   0.00793     0.01323     0.00811
# Add         75708.142   0.01064     0.01792     0.01090
# Triad       76265.085   0.01056     0.01411     0.01071
# Dot         103668.530  0.00518     0.01109     0.00547
extract_bw(kind='Copy')[source]

Extract bandwidth metric

set_perf_patterns()[source]

Set performance metrics

set_reference_values()[source]

Set reference perf metrics

GPUDirect RDMA Benchmark Tests

Context

The GPUDirect RDMA (GDR) technology exposes GPU memory to I/O devices by enabling the direct communication path between GPUs in two remote systems. This feature eliminates the need to use the system CPUs to stage GPU data in and out intermediate system memory buffers. As a result the end-to-end latency is reduced and the sustained bandwidth is increased (depending on the PCIe topology).

The GDRCopy (GPUDirect RDMA Copy) library leverages the GPUDirect RDMA APIs to create CPU memory mappings of the GPU memory. The advantage of a CPU driven copy is the very small overhead involved. That is helpful when low latencies are required.

Note

OSU micro benchmark suite is used to test the GDR capabilities in the current test setting. A more lower level verbs tests can also be used if the user wishes to remove the overhead imposed by MPI.

Included benchmarks

Currently, the test includes following categories of benchmarks:

Type of benchmark:

  • bw: Uni directional bandwidth test

  • bibw: Bi directional bandwidth test

  • latency: Latency test

Communication type:

  • D_D: Device to device

  • D_H: Device to host

  • H_D: Host to device

By default all the combination of tests will be performed. Both types of tests are parameterised and the user can select one or more of these tests at the run time using tags. This will be discussed in Usage.

Each of these tests will be executed in four different modes:

  • GPUDirect RDMA and GDR Copy Enabled

  • GPUDirect RDMA Enabled and GDR Copy Disabled

  • GPUDirect RDMA Disabled and GDR Copy Enabled

  • GPUDirect RDMA and GDR Copy Disabled

This will enable us investigate the effect of each component on the bandwidth and latency.

Benchmark configuration

There are two important variables for this test that need to be taken care of. They are

  • net_adptr: Network adapter to use (default: mlx5_0:1)

  • ucx_tls: UCX transport modes (default: ['rc', 'cuda_copy'])

The value for net_adptr can be passed in two different ways:

  • In the system/partition configuration as the key value pair in extras field.

  • Other option would be to use -S flag at CLI to set the variable.

The value defined using -S flag has precedence over the system configuration value. If none of them are set, default value is used in the test. An example on how to define in extras is as follows:

'extras': {
             'interconnect': '100',  # in Gb/s
             'gpu_mem': '42505076736',  # in bytes
             'gdr_test_net_adptr': 'mlx5_0:1',  # NIC that has end-to-end connectivity for GDR test
         }

An optimal settings of these variables is necessary in order to leverage the available bandwidth of Infiniband (IB) stack. We should choose the network adapter that has end-to-end connectivity with GPUs.

Tip

We can get this information from nvidia-smi topo -m command output. A typical output from this command can be as follows:

        GPU0    mlx5_0  mlx5_1  mlx5_2  mlx5_3  mlx5_4  mlx5_5  CPU Affinity    NUMA Affinity
GPU0     X      NODE    NODE    PIX     PIX     PIX     PIX     0-19    0
mlx5_0  NODE     X      PIX     NODE    NODE    NODE    NODE
mlx5_1  NODE    PIX      X      NODE    NODE    NODE    NODE
mlx5_2  PIX     NODE    NODE     X      PIX     PIX     PIX
mlx5_3  PIX     NODE    NODE    PIX      X      PIX     PIX
mlx5_4  PIX     NODE    NODE    PIX     PIX      X      PIX
mlx5_5  PIX     NODE    NODE    PIX     PIX     PIX      X

Legend:

 X    = Self
 SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
 NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
 PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
 PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
 PIX  = Connection traversing at most a single PCIe bridge
 NV#  = Connection traversing a bonded set of # NVLinks

We have to make sure to use adapter with the PIX attribute (single PCIe bridge). In this case mlx5_2, mlx5_3, mlx5_4 and mlx5_5 are directly connected to PCI express and we can choose any of them

Similarly, for the case of UCX transport methods, we can choose the ones that are available on the system. This information can be gathered using ucx_info -d which lists all the available transport methods. These default variables can be overridden from CLI which will be shown in Usage.

Usage

The test can be run using following commands.

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/gpu/gdr_test/reframe_gdr.py --run --performance-report

If we want to set ucx_tls to ['dc', 'cuda-copy'] and net_adptr to mlx5_3:1 we can use -S flag as follows

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/gpu/gdr_test/reframe_gdr.py -S net_adptr=mlx5_3:1 -S ucx_tls=dc,cuda_copy --run --performance-report

Similarly, if we want to restrict the tests to only D_D (device to device) and bw (uni bandwidth), we can use tags as follows

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/gpu/gdr_test/reframe_gdr.py -t D_D -t bw$ --run --peformance-report

Test class documentation

class apps.level0.gpu.gdr_test.reframe_gdr.GpuDirectRdmaTest(*args, **kwargs)[source]

GPU Direct RDMA test to benchmark bandwidth and latency between inter node GPUs

gen_msg_sizes()[source]

Generate the message sizes used in benchmark in bytes

override_net_adptr_from_sys_config()[source]

Override network adapter variable if found in sys config

set_git_commit_tag()[source]

Fetch git commit hash

set_tags()[source]

Add tags to the test

set_env_variables()[source]

Set environment variables

patch_job_launcher()[source]

Monkey mock the job launcher command

set_test_cases()[source]

Define all the test cases and corresponding env variables

add_launcher_options()[source]

Add job launcher options

set_executable()[source]

Set executable and options

get_full_job_cmd()[source]

Get full job command to use it in different tests

set_prerun_cmds()[source]

Set prerun commands. Set env variables for case of RDMA and GDR copy enabled

set_postrun_cmds()[source]

Set post run commands. Run rest of the cases

set_sanity_patterns()[source]

Set sanity patterns. Example stdout:

# Test with RDMA_GDR_Copy_Enabled started
# OSU MPI-CUDA Bandwidth Test v5.7.1
# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
# Size      Bandwidth (MB/s)
# 1                       1.89
# 2                       3.80
# 4                       7.65
# 8                      15.25
# Test with RDMA and GDR Copy Enabled finished
parse_stdout(msg_size, case)[source]

Read the stdout file to extract perf metrics

Parameters
  • msg_size (int) – Size of the message in bytes

  • case (str) – Test case

Returns

Metric value

Return type

float

extract_bw(msg_size=1, case='RDMA_GDR_Copy_Enabled')[source]

Performance function to extract uni bandwidth

extract_bibw(msg_size=1, case='RDMA_GDR_Copy_Enabled')[source]

Performance function to extract bi bandwidth

extract_latency(msg_size=1, case='RDMA_GDR_Copy_Enabled')[source]

Performance function to extract latency

set_perf_patterns()[source]

Set performance variables

set_reference_values()[source]

Set reference perf values

NCCL performance benchmarks

Context

NCCL is a stand-alone library of standard communication routines for GPUs, implementing all-reduce, all-gather, reduce, broadcast, reduce-scatter, as well as any send/receive based communication pattern. It has been optimized to achieve high bandwidth on platforms using PCIe, NVLink, NVswitch, as well as networking using InfiniBand Verbs or TCP/IP sockets.

In this test, we are only interested in the intra-node communication latencies and bandwidths and so, we run this test on single node with multiple GPUs. The benchmarks report the so-called bus bandwidth that can be used to compare with underlying hardware peak bandwidth for collective communications. More details on how the bus bandwidth is estimated can be found at nccl tests repository.

Note

Each benchmark runs in two different modes namely, in-place and out-of-place. An in-place operation uses the same buffer for its output as was used to provide its input. An out-of-place operation has distinct input and output buffers.

Test variants

The test is parameterised to run following communication benchmarks:

  • sendrecv

  • gather

  • scatter

  • reduce

  • all_gather

  • all_reduce

A specific test can be chosen at the runtime using -t flag on CLI. An example is shown in the Usage.

Test configuration

The tests can be configured to change the minimum and maximum sizes of the messages that will be used in benchmarks. They can be configured at the runtime using min_size and max_size variables. The default values are 8 bytes and 128 MiB, respectively.

Usage

The test can be run using following commands.

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/gpu/nccl_test/reframe_nccltest.py --run --performance-report

To run only scatter and gather variants and skip rest of the benchmarks, use -t flag as follows:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/gpu/nccl_test/reframe_nccltest.py -t scatter$ -t gather$ --run --performance-report

To change the default value of min_size during runtime, use -S flag. For example to use 1 MiB of min_size:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/gpu/nccl_test/reframe_nccltest.py -S min_size=1M --run --performance-report

Test class documentation

class apps.level0.gpu.nccl_test.reframe_nccltest.NcclTestDownload(*args, **kwargs)[source]

Fixture to fetch NCCL test source code

set_sanity_patterns()[source]

Set sanity patterns

class apps.level0.gpu.nccl_test.reframe_nccltest.NcclTestBuild(*args, **kwargs)[source]

NCCL tests compile test

set_sourcedir()[source]

Set source directory from dependencies

set_build_system_opts()[source]

Set build system options

class apps.level0.gpu.nccl_test.reframe_nccltest.NcclPerfTest(*args, **kwargs)[source]

NCCL performance tests main class

gen_msg_sizes()[source]

Generate list of message sizes

set_sourcesdir()[source]

Set source directory

set_git_commit_tag()[source]

Fetch git commit hash

set_num_tasks_reservation()[source]

Set number of tasks for job reservation

set_tags()[source]

Add tags to the test

set_launcher()[source]

Set launcher to local to avoid appending mpirun or srun

set_executable()[source]

Set executable name

set_sanity_patterns()[source]

Set sanity patterns. Example stdout:

# # Out of bounds values : 0 OK
# # Avg bus bandwidth    : 0.791943
parse_stdout(msg_size, place, ind)[source]

Read stdout file to extract perf variables

extract_algbw(msg_size=None, place='in')[source]

Performance function to extract algorithmic bandwidth

extract_busbw(msg_size=None, place='in')[source]

Performance function to extract bus bandwidth

extract_time(msg_size=None, place='in')[source]

Performance function to extract latency

set_perf_patterns()[source]

Set performance variables. Sample stdout:

# # nThread 1 nGpus 2 minBytes 8 maxBytes 134217728 step: 2(factor) warmup iters: 5 iters: 20 validation: 1
# #
# # Using devices
# #   Rank  0 Pid  16042 on grouille-1 device  0 [0x21] A100-PCIE-40GB
# #   Rank  1 Pid  16042 on grouille-1 device  1 [0x81] A100-PCIE-40GB
# #
# #                                               out-of-place                       in-place
# #       size         count      type     time   algbw   busbw  error     time   algbw   busbw  error
# #        (B)    (elements)               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
#            8             2     float    23.17    0.00    0.00  0e+00    22.75    0.00    0.00  0e+00
#           16             4     float    22.65    0.00    0.00  0e+00    22.68    0.00    0.00  0e+00
#           32             8     float    22.44    0.00    0.00  0e+00    22.54    0.00    0.00  0e+00
#           64            16     float    22.83    0.00    0.00  0e+00    22.37    0.00    0.00  0e+00
#          128            32     float    22.72    0.01    0.01  0e+00    22.64    0.01    0.01  0e+00
#          256            64     float    22.67    0.01    0.01  0e+00    22.47    0.01    0.01  0e+00
#          512           128     float    22.42    0.02    0.02  0e+00    22.26    0.02    0.02  0e+00
#         1024           256     float    22.63    0.05    0.05  0e+00    22.50    0.05    0.05  0e+00
#         2048           512     float    22.47    0.09    0.09  0e+00    22.52    0.09    0.09  0e+00
#         4096          1024     float    23.33    0.18    0.18  0e+00    23.28    0.18    0.18  0e+00
#         8192          2048     float    25.22    0.32    0.32  0e+00    24.90    0.33    0.33  0e+00
#        16384          4096     float    33.57    0.49    0.49  0e+00    33.95    0.48    0.48  0e+00
#        32768          8192     float    48.07    0.68    0.68  0e+00    49.19    0.67    0.67  0e+00
#        65536         16384     float    68.66    0.95    0.95  0e+00    72.52    0.90    0.90  0e+00
#       131072         32768     float    115.3    1.14    1.14  0e+00    114.1    1.15    1.15  0e+00
#       262144         65536     float    176.9    1.48    1.48  0e+00    174.8    1.50    1.50  0e+00
#       524288        131072     float    334.0    1.57    1.57  0e+00    342.7    1.53    1.53  0e+00
#      1048576        262144     float    643.3    1.63    1.63  0e+00    599.7    1.75    1.75  0e+00
#      2097152        524288     float   1125.6    1.86    1.86  0e+00   1077.9    1.95    1.95  0e+00
#      4194304       1048576     float   2813.4    1.49    1.49  0e+00   2670.2    1.57    1.57  0e+00
#      8388608       2097152     float   5561.3    1.51    1.51  0e+00   5497.1    1.53    1.53  0e+00
#     16777216       4194304     float    11070    1.52    1.52  0e+00    10950    1.53    1.53  1e+00
#     33554432       8388608     float    22215    1.51    1.51  0e+00    22687    1.48    1.48  1e+00
#     67108864      16777216     float    45987    1.46    1.46  0e+00    46600    1.44    1.44  1e+00
#    134217728      33554432     float    95433    1.41    1.41  0e+00    96707    1.39    1.39  1e+00
# # Out of bounds values : 0 OK
# # Avg bus bandwidth    : 0.778536
# #
set_reference_values()[source]

Set reference perf values

Funclib Test

Context

This test runs functions from ska-sdp-func <https://gitlab.com/ska-telescope/sdp/ska-sdp-func>. Currently implemented are tests for DFT and Phase Rotation.

Test variables

The DFT test supports two different polarisations as parameters.

The Phaserotation test supports two parameters, which are configured as tuples in one ReFrame parameter. Those two parameters are “baselines” which are the number of baselines to be tested and “times”.

Environment variables

By default, the test will create a conda environment and run inside it for the sake of isolation. This can be controlled using env variable CREATE_CONDA_ENV. By setting it to NO, the test WILL NOT create a conda environment.

Similarly, the performance metrics are monitored using the perfmon toolkit. If the user does not want to monitor metrics, it can be achieved by setting MONITOR_METRICS=NO.

Usage

The tests can be run using the following commands:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/gpu/hippo_func_lib/reframe_funclib_test.py --run --performance-report

If we want to change the variables to non default values, we should use -S flag. For example, if we want to run only 5 major cycles and 64 frequency channels, use

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level0/gpu/hippo_func_lib/reframe_funclib_test.py -S start=1000000 --run --performance-report

Test class documentation

class apps.level0.gpu.hippo_func_lib.reframe_funclib_test.FunclibTestDownload(*args, **kwargs)[source]

Fixture to fetch ska-sdp-func source code

class apps.level0.gpu.hippo_func_lib.reframe_funclib_test.FunclibTestBuild(*args, **kwargs)[source]

Funclib test compile test

set_sourcedir()[source]

Set source path based on dependencies

set_prebuild_cmds()[source]

Make local lib dirs

set_build_system_attrs()[source]

Set build directory and config options

set_postbuild_cmds()[source]

Install libs

set_sanity_patterns()[source]

Set sanity patterns

class apps.level0.gpu.hippo_func_lib.reframe_funclib_test.FunclibDftTest(*args, **kwargs)[source]
set_launcher()[source]

Set launcher to local as it is no multi-node application

set_sanity_patterns()[source]

Expected Output For every device and every version, the following block is printed to stdout:

set_perf_patterns()[source]

Set performance metrics

set_reference_values()[source]

Set reference performance values.

class apps.level0.gpu.hippo_func_lib.reframe_funclib_test.PhaserotTest(*args, **kwargs)[source]
set_launcher()[source]

Set launcher to local as it is no multi-node application

set_sanity_patterns()[source]

Expected Output For every device the following block is printed to stdout:

set_perf_patterns()[source]

Set performance metrics

set_reference_values()[source]

Set reference performance values.

Level 1 Benchmark Tests

CUDA NIFTY gridder performance benchmark

Context

CUDA NIFTY Gridder (CNG) is a CUDA implementation of NIFTY gridder to (de)grid interferometric data using improved w-stacking algorithm.

In this test, we are interested in performance of CNG on different GPU devices. In order to stress the gridder, we use SKA1 MID synthetic dataset with configurable image size. More details on the design of the benchmark can be found in src/ folder.

Note

The benchmark test uses visibility data that is randomly generated for a given uvw coverage. It is very expensive to do a DFT on this data to estimate the accuracy of the CNG. Hence, no accuracy tests are performed within this benchmark.

Test configuration

The tests can be configured to change the minimum and maximum number of frequency channels that will be used in the benchmark. Similarly, the image size can also be configured at the runtime. Available variables that can be configured at runtime are

  • min_chans: Minimum number of frequency channels as power of 2 (default: 0)

  • max_chans: Maximum number of frequency channels as power of 2 (default: 11)

  • img_size: Image size as multiple of 1024 (default: 8)

With default variables, the benchmark will test on a image size of 8192 x 8192 pixels using 1 to 1024 frequency channels. They can be configured from CLI using -S flag which will be shown in Usage.

Usage

The test can be run using following commands.

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level1/cng_test/reframe_cngtest.py --run --performance-report

To run using 16k image and till 4096 frequency channels, use -S option as

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level1/cng_test/reframe_cngtest.py -S img_size=16 -t max_chans=13 --run --performance-report

Test class documentation

class apps.level1.cng_test.reframe_cngtest.CngTest(*args, **kwargs)[source]

CUDA NIFTY Gridder (CNG) performance tests main class

set_conda_prerun_cmds()[source]

Emit conda env prerun commands

set_source_path()[source]

Get source path attribute

set_tags()[source]

Add tags to the test

install_cng_test()[source]

Installs cng_test in conda env

set_launcher_opts()[source]

Set job launcher options

set_executable()[source]

Set executable name

set_git_commit_tag()[source]

Override git tag method from base class

set_sanity_patterns()[source]

Set sanity patterns. Example stdout:

All tests have successfully finished
parse_stdout(num_chan, perf)[source]

Read stdout file to extract perf variables

extract_time(num_chan=None, perf='invert')[source]

Performance function to extract (de)gridding times

extract_vis(num_chan=None, perf='vis')[source]

Performance function to extract number of visibilities

set_perf_patterns()[source]

Set performance variables. Sample stdout:

---------------------------------------------------------------------------------------
# Image size:                            4096 x 4096
# Pixel size (in degrees):               4.099e-05
# Field of view (in degrees):            1.679e-01
# Minimum frequency:                     1.300e+09
# Maximum frequency:                     1.360e+09
# Number of baselines:                   312048
# Integration interval (in sec):         1800
# Precision:                             sp
# Accuracy:                              1e-05
# Number of iterations:                  10
=======================================================================================
            CUDA NIFTY Benchmark results using synthetic SKA1 MID dataset
=======================================================================================
# # Channels        # Visibilities    Invert time [s]   Predict time [s]
# 1                 312048            0.30428           0.09089
# 2                 624096            0.33928           0.09925
# 4                 1248192           0.34887           0.10189
# 8                 2496384           0.38257           0.11387
# 16                4992768           0.43269           0.14284
# 32                9985536           0.53047           0.17708
# 64                19971072          0.73875           0.26098
# 128               39942144          1.26618           0.44106
# 256               79884288          2.17768           0.74469
# 512               159768576         4.34335           1.34687
# 1024              319537152         8.66043           2.64405
---------------------------------------------------------------------------------------
# End of table
# All tests have successfully finished
set_reference_values()[source]

Set reference perf values

IDG Test

Context

The image-domain gridder (IDG) is a new, fast gridder that makes w-term correction and a-term correction computationally very cheap. It performs extremely well on gpus. The source code is hosted on ASTRON GitLab repository and documentation can be found here.

Test variables

The test supports several runtime configurable variables:

  • layout: Antenna layout. Available options are SKA1_low and SKA1_mid. (default is SKA1_low)

  • num_cycles: Number of major cycles (default: 10)

  • num_stations: Number of antenna stations (default: 100)

  • gridsize: Gridsize used for IDG (default: 8192)

  • num_chans: Number of frequency channels (default: 128)

This benchmark uses either SKA1_low or SKA1_mid antenna layout and generate random visibility data to do gridding and degridding. We use only one node and one GPU to run the benchmark and report various performance metrics. All these variables can be configured at the runtime which will be discussed in :ref:idg usage.

Environment variables

By default, the test will create a conda environment and run inside it for the sake of isolation. This can be controlled using env variable CREATE_CONDA_ENV. By setting it to NO, the test WILL NOT create conda environment.

Similarly, the performance metrics are monitored using the perfmon toolkit. If the user does not want to monitor metrics, it can be achieved by setting MONITOR_METRICS=NO.

Usage

The tests can be run using the following commands:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level1/idg_test/reframe_idgtest.py --run --performance-report

If we want to change the variables to non default values, we should use -S flag. For example, if we want to run only 5 major cycles and 64 frequency channels, use

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level1/idg_test/reframe_idgtest.py -S num_cycles=5 -S num_chans=64 --run --performance-report

Test class documentation

class apps.level1.idg_test.reframe_idgtest.IdgTestDownload(*args, **kwargs)[source]

Fixture to fetch IDG source code

class apps.level1.idg_test.reframe_idgtest.IdgTestBuild(*args, **kwargs)[source]

IDG test compile test

set_sourcedir()[source]

Set source path based on dependencies

set_prebuild_cmds()[source]

Make local lib dirs

set_build_system_attrs()[source]

Set build directory and config options

set_postbuild_cmds()[source]

Install libs

set_sanity_patterns()[source]

Set sanity patterns

class apps.level1.idg_test.reframe_idgtest.IdgTest(*args, **kwargs)[source]

Main class of IDG benchmark tests

check_vars()[source]

Check test variables

set_executable()[source]

Set executable path and executable

set_tags()[source]

Add tags to the test

get_num_nodes()[source]

Get number of nodes from total cores requested and number of cores per node

set_num_tasks_job()[source]

This method sets tasks for the job. We use this to override the num_tasks set for reservation. Using this approach we can set num_tasks to job in a more generic way

set_env_vars()[source]

Set environment variables

add_launcher_options()[source]

Set job launcher options

pre_launch()[source]

Set prerun commands. It includes setting scratch directory and pre run commands from base class

set_sanity_patterns()[source]

Set sanity patterns. Example stdout:

>>> Total runtime
gridding:   6.5067e+02 s
degridding: 1.0607e+03 s
fft:        3.5437e-01 s
get_image:  6.5767e+00 s
imaging:    2.0073e+03 s

>>> Total throughput
gridding:   3.12 Mvisibilities/s
degridding: 1.91 Mvisibilities/s
imaging:    1.01 Mvisibilities/s
extract_time(kind='gridding')[source]

Performance extraction function for time. Sample stdout:

>>> Total runtime
gridding:   7.5473e+02 s
degridding: 1.1090e+03 s
fft:        3.5368e-01 s
get_image:  7.2816e+00 s
imaging:    1.8899e+03 s
extract_vis_thpt(kind='gridding')[source]

Performance extraction function for visibility throughput. Sample stdout:

>>> Total throughput
gridding:   2.69 Mvisibilities/s
degridding: 1.83 Mvisibilities/s
imaging:    1.07 Mvisibilities/s
set_perf_patterns()[source]

Set performance variables

set_reference_values()[source]

Set reference perf values

Imaging IO Test

Context

This is a prototype exploring the capability of hardware and software to deal with the types of I/O loads that the SDP will have to support for full-scale operation on SKA1 (and beyond). The benchmark is written in plain C and uses MPI for communication. The source code is hosted on SKA GitLab repository and documentation can be found here.

Test parameterisation

Currently, the benchmark supports three different parameterisations namely, - Variant of benchmark - Number of cores - Size of benchmark

Within variant, three different tests are defined namely, dry, write and read tests. As name suggests dry test runs all the computations without writing the data to the disk. This benchmark can be used to assess the computational performance of the prototype. Write test does the computations and writes the data to the disk and hence, benchmarks the I/O performance of the underlying file system. And finally, read test reads the data that has been written to the disk and gives the read performance.

Size of the benchmark defines how big of the benchmark we would want to run. The size low-small indicates small image for SKA1 LOW configuration, whereas low-large is large image (96k) for SKA1 LOW configuration. The same holds for mid-small and mid-large, albeit, for SKA1 MID, large image size is 192k.

They are defined in the ReFrame test as follows:

variant = parameter(['dry-test', 'write-test', 'read-test'])
num_cores = parameter(1 << i for i in range(min, max))
size = parameter(['tiny', 'low-small', 'low-large', 'mid-small', 'mid-large'])

Environment variables

All these parameterisations are provided as tags to the ReFrame tests and hence, we can simply choose which benchmark we want to run by specifying appropriate tags on command line. By default, min and max used to parameterise num_cores are 9 and 14, respectively. However, these variables can be overridden using custom environment variables IMAGINGIOTEST_MIN and IMAGINGIOTEST_MAX, respectively.

By default, the test will create a conda environment and run inside it for the sake of isolation. This can be controlled using env variable CREATE_CONDA_ENV. By setting it to NO, the test WILL NOT create conda environment.

Similarly, the performance metrics are monitored using the perfmon toolkit. If the user does not want to monitor metrics, it can be achieved by setting MONITOR_METRICS=NO.

Test filtering

The tests can be run using the following commands:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level1/imaging_iotest/reframe_iotest.py --run --performance-report

But first let’s see the tests generated by ReFrame using --list command as follows:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level1/imaging_iotest/reframe_iotest.py --list

The output is shown below:

[ReFrame Setup]
version:           3.10.0-dev.3+1407ae75
command:           '/home/mpaipuri/benchmark-tests/main/reframe/bin/reframe -c apps/level1/imaging_iotest/reframe_iotest.py -l'
launched by:       mpaipuri@fnancy.nancy.grid5000.fr
working directory: '/home/mpaipuri/benchmark-tests/main'
settings file:     '/home/mpaipuri/benchmark-tests/main/reframe_config.py'
check search path: (R) '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py'
stage directory:   '/home/mpaipuri/benchmark-tests/main/stage'
output directory:  '/home/mpaipuri/benchmark-tests/main/output'

 [List of matched checks]
 - ImagingIOTest_read_test_low_small_32 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_mid_large_16 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_mid_small_64 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_mid_large_16 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_low_large_64 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_tiny_128 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_mid_small_8 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_low_large_64 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_low_large_32 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_mid_large_128 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_low_small_8 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_low_large_16 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_mid_small_16 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_low_small_32 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_low_small_16 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_mid_small_128 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_tiny_8 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_low_large_8 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_tiny_128 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_low_large_128 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_mid_small_8 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_tiny_32 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_low_small_128 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_mid_small_32 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_tiny_64 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_tiny_8 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_tiny_16 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_mid_large_8 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_mid_small_8 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_low_large_128 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_tiny_32 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_low_small_64 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_tiny_16 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_mid_large_16 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_mid_large_64 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_tiny_128 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_mid_large_64 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_mid_large_8 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_low_large_32 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_low_large_8 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_low_small_128 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_mid_large_32 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_low_small_32 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_low_large_16 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_low_large_64 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_low_large_8 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_mid_small_64 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_low_large_128 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_tiny_64 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_low_small_128 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_low_large_16 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_tiny_64 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_low_small_16 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_mid_small_32 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_mid_large_128 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_mid_large_128 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_mid_small_128 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_low_small_8 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_mid_large_32 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_low_small_64 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_mid_small_128 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_tiny_32 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_mid_large_64 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_low_large_32 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_low_small_8 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_mid_small_64 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_mid_small_16 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_mid_large_8 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_mid_small_32 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_mid_small_16 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_read_test_tiny_8 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_low_small_64 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_write_test_mid_large_32 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_low_small_16 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTest_dry_test_tiny_16 (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 - ImagingIOTestBuild (found in '/home/mpaipuri/benchmark-tests/main/apps/level1/imaging_iotest/reframe_iotest.py')
 Found 76 check(s)

 Log file(s) saved in '/home/mpaipuri/benchmark-tests/main/reframe.log', '/home/mpaipuri/benchmark-tests/main/reframe.out'

As we can see, there are a lot of 76 tests generated by ReFrame and they are for a given system partition. If we defined multiple partitions and environments, number of tests will be multiplied by number of partitions and environments. This is not very ideal (unless we have infinite resources to run these tests on) and usually we use test filtering to run specific tests.

For example, if we want to run dry-test with 8 nodes and low-small test case, following commands must be used

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
IMAGINGIOTEST_MIN=3 IMAGINGIOTEST_MAX=4 reframe/bin/reframe -C reframe_config.py -c apps/level1/imaging-iotest/reframe_iotest.py --tag dry-test$ --tag low-small$ --run --performance-report

The environment variables IMAGINGIOTEST_MIN=3 and IMAGINGIOTEST_MAX=4 will generate one test with 8 nodes for the reservation. Similarly, tags dry-test$ and low-small$ will select only tests with those tags. We can also restrict the tests for a given partition using --system flag and programming environment with -p flag.

Test class documentation

class apps.level1.imaging_iotest.reframe_iotest.ImagingIOTestDownload(*args, **kwargs)[source]

Fixture to fetch Imaging IO test source code

set_postrun_cmds()[source]

Pull LFS objects

class apps.level1.imaging_iotest.reframe_iotest.ImagingIOTestBuild(*args, **kwargs)[source]

Imaging IO test compile test

set_sourcedir()[source]

Set source path based on dependencies

set_build_system_attrs()[source]

Set build directory and config options

set_sanity_patterns()[source]

Set sanity patterns

class apps.level1.imaging_iotest.reframe_iotest.ImagingIOTest(*args, **kwargs)[source]

Main class of Imaging IO runtime tests

set_executable()[source]

Set executable path and executable

set_tags()[source]

Add tags to the test

get_num_nodes()[source]

Get number of nodes from total cores requested and number of cores per node

set_subgrid_workers()[source]

Set number of subgrid workers based on NUMA nodes

set_num_tasks_job()[source]

This method sets tasks for the job. We use this to override the num_tasks set for reservation. Using this approach we can set num_tasks to job in a more generic way

set_num_threads()[source]

Set number of OpenMP threads and OpenMP environment variables

add_launcher_options()[source]

Set job launcher options

set_executable_opts()[source]

Set iotest executable options

pre_launch()[source]

Set prerun commands. It includes setting scratch directory and pre run commands from base class

post_launch()[source]

Set post run commands. It includes removing visibility data files and running read-test

set_sanity_patterns()[source]

Set sanity patterns. Example stdout:

# Fri Jul 23 15:15:41 2021[1,0]<stdout>:Operations:

We check number of time the above line is printed and compare it with number of sub grid workers

extract_stream_time()[source]

Performance extraction function for stream time. Sample stdout:

Fri Jul 23 15:15:41 2021[1,2]<stdout>:Streamed for 3.53s
extract_degrid_flop()[source]

Performance extraction function for degrid flop. Sample output:

# Fri Jul 23 15:15:41 2021[1,0]<stdout>: degrid 108.943 Gflops (10.9 GFlop/s, 92012544/92012544 visibilities, 1.47 GB, rate 0.15 GB/s, 25432 chunks)
extract_degrid_flops()[source]

Performance extraction function for degrid flops. Sample output:

# Fri Jul 23 15:15:41 2021[1,0]<stdout>: degrid 108.943 Gflops (10.9 GFlop/s, 92012544/92012544 visibilities, 1.47 GB, rate 0.15 GB/s, 25432 chunks)
extract_degrid_rate()[source]

Performance extraction function for degrid rate. Sample output:

# Fri Jul 23 15:15:41 2021[1,0]<stdout>: degrid 108.943 Gflops (10.9 GFlop/s, 92012544/92012544 visibilities, 1.47 GB, rate 0.15 GB/s, 25432 chunks)
extract_fft_flop()[source]

Performance extraction function for fft flop. Sample output:

# Fri Jul 23 15:15:41 2021[1,0]<stdout>: FFTs 3.296 Gflop (0.3 Gflop/s)
extract_fft_flops()[source]

Performance extraction function for fft flops. Sample output:

# Fri Jul 23 15:15:41 2021[1,0]<stdout>: FFTs 3.296 Gflop (0.3 Gflop/s)
extract_write_bw()[source]

Performance extraction function for write bandwidth. Sample output:

# Fri Jul 23 15:15:41 2021[1,2]<stdout>:Writer 2: Wait: 2.02671s, Read: 0.158911s, Write: 1.28306s, Idle: 0.0658979s
extract_read_bw()[source]

Performance extraction function for read bandwidth. Sample output:

# Fri Jul 23 15:15:41 2021[1,2]<stdout>:Writer 2: Wait: 2.02671s, Read: 0.158911s, Write: 1.28306s, Idle: 0.0658979s

for dry tests, read time will be zero. To avoid ZeroDivisionError, we add a small threshold value

set_perf_patterns()[source]

Set performance variables

set_reference_values()[source]

Set reference perf values

Level 2 Benchmark Tests

RASCIL

Context

The Radio Astronomy Simulation, Calibration and Imaging Library expresses radio interferometry calibration and imaging algorithms in python and numpy. The interfaces all operate with familiar data structures such as image, visibility table, gain table, etc. The source code is hosted on SKA GitLab repository and documentation can be found here.

Test parameterisation

Currently, the benchmark supports three different parameterisations namely, - Number of nodes - Size of benchmark

Size of the benchmark defines how big of the benchmark we would want to run. In this case, size refers to number of frequency channels we are going to use to make the continuum image. Finally, scalability of the test can be specified using number of nodes. They are defined in the ReFrame test as follows:

size = parameter(['small', 'large', 'very-large', 'huge'])
num_nodes = parameter(1 << i for i in range(int(min), int(max)))

Environment variables

By default, the variables min and max are defined as 3 and 7, respectively. So, this parameterisation creates tests with number of nodes ranging from 8 to 64 in multiples of 2. The user can override the min and max variables using custom test environment variables RASCILTEST_MIN and RASCILTEST_MAX, respectively. If any of these environment variables are set, they will take precedence over default values. All these parameterisations are provided as tags to the ReFrame tests and hence, we can simply choose which benchmark we want to run by specifying appropriate tags on command line.

By default, the test will create a conda environment and run inside it for the sake of isolation. This can be controlled using env variable CREATE_CONDA_ENV. By setting it to NO, the test WILL NOT create conda environment.

Similarly, the performance metrics are monitored using the perfmon toolkit. If the user does not want to monitor metrics, it can be achieved by setting MONITOR_METRICS=NO.

Usage

The tests can be run using the following commands:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level2/rascil/reframe_rascil.py --run --performance-report

But first let’s see the tests generated by ReFrame using --list command as follows:

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c apps/level2/rascil/reframe_rascil.py --list

The output is shown below:

[ReFrame Setup]
version:           3.8.0-dev.2+8a9ceeda
command:           'reframe/bin/reframe -C reframe_config.py -c apps/level2/rascil/reframe_rascil.py -l'
launched by:       mpaipuri@fnancy
working directory: '/home/mpaipuri/ska-sdp-benchmark-tests'
settings file:     'reframe_config.py'
check search path: (R) '/home/mpaipuri/ska-sdp-benchmark-tests/apps/level2/rascil/reframe_rascil.py'
stage directory:   '/home/mpaipuri/ska-sdp-benchmark-tests/stage'
output directory:  '/home/mpaipuri/ska-sdp-benchmark-tests/output'

[List of matched checks]
- RascilTest_small_100 (found in '/home/mpaipuri/ska-sdp-benchmark-tests/apps/level2/rascil/reframe_rascil.py')
- RascilTest_large_16 (found in '/home/mpaipuri/ska-sdp-benchmark-tests/apps/level2/rascil/reframe_rascil.py')
- RascilTest_large_50 (found in '/home/mpaipuri/ska-sdp-benchmark-tests/apps/level2/rascil/reframe_rascil.py')
- RascilTest_large_100 (found in '/home/mpaipuri/ska-sdp-benchmark-tests/apps/level2/rascil/reframe_rascil.py')
- RascilAndDatasetDownloadTest (found in '/home/mpaipuri/ska-sdp-benchmark-tests/apps/level2/rascil/reframe_rascil.py')
- RascilTest_small_25 (found in '/home/mpaipuri/ska-sdp-benchmark-tests/apps/level2/rascil/reframe_rascil.py')
- RascilTest_large_25 (found in '/home/mpaipuri/ska-sdp-benchmark-tests/apps/level2/rascil/reframe_rascil.py')
- RascilTest_large_8 (found in '/home/mpaipuri/ska-sdp-benchmark-tests/apps/level2/rascil/reframe_rascil.py')
- RascilTest_small_16 (found in '/home/mpaipuri/ska-sdp-benchmark-tests/apps/level2/rascil/reframe_rascil.py')
- RascilTest_small_50 (found in '/home/mpaipuri/ska-sdp-benchmark-tests/apps/level2/rascil/reframe_rascil.py')
- RascilTest_small_8 (found in '/home/mpaipuri/ska-sdp-benchmark-tests/apps/level2/rascil/reframe_rascil.py')
- RascilBuildTest (found in '/home/mpaipuri/ska-sdp-benchmark-tests/apps/level2/rascil/reframe_rascil.py')
Found 12 check(s)

Log file(s) saved in '/home/mpaipuri/ska-sdp-benchmark-tests/reframe.log', '/home/mpaipuri/ska-sdp-benchmark-tests/reframe.out'

As we can see, there are a lot of 12 tests generated by ReFrame and they are for a given system partition. If we defined multiple partitions and environments, number of tests will be multiplied by number of partitions and environments.

For example, if we want to run small test with 8 nodes, following commands must be used

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c aapps/level2/rascil/reframe_rascil.py --tag dry-test --tag 8$ --tag small --run --performance-report

Test class documentation

class apps.level2.rascil.reframe_rascil.RascilAndDatasetDownloadTest(*args, **kwargs)[source]

Fetch RASCIL sources and datasets

download_dataset()[source]

Download dataset using subprocess command

set_sanity_patterns()[source]

Set sanity patterns

class apps.level2.rascil.reframe_rascil.RascilBuildTest(*args, **kwargs)[source]

RASCIL build test

set_sourcedir()[source]

Set source directory from dependencies

set_env_variables()[source]

Set env variables

pre_launch()[source]

Install dependencies of RASCIL before installing RASCIL

set_launcher()[source]

Set launcher to local to avoid appending mpirun or srun to python command

set_executable()[source]

Set executable as python to install RASCIL using setup.py

set_sanity_patterns()[source]

Set sanity patterns

class apps.level2.rascil.reframe_rascil.RascilTest(*args, **kwargs)[source]

Main class of RASCIL runtime tests

convert_datestring_timestamp(time_str, fmt)[source]

Convert date string to time stamp

Parameters
  • time_str (str) – Date time string

  • fmt (str) – Format of date time string

Returns

Time stamp of the date time string

Return type

str

set_tags()[source]

Add tags to the test

set_dependencies()[source]

Set dependencies of the test

set_executable(RascilBuildTest)[source]

Set executable path from dependencies

skip_tests()[source]

Skip test based on rules defined

set_num_tasks_job()[source]

Set number of MPI tasks of the job

set_env_variables()[source]

Set OpenMP threads and related environment variables

set_nfreq_channels()[source]

Set number of frequency channels based on number of nodes

set_launcher()[source]

Set launcher to local to avoid appending mpirun or srun to python command

setup_dask_launcher_cmd()[source]

Setup dask launcher command

Symlink dataset from prefix to stagedir

set_executable_opts()[source]

Set RASCIL executable options

set_keep_files()[source]

Set list of files that we want to keep in output folder

rm_dask_worker_space()[source]

Set RESULTS DIR env variable

pre_launch()[source]

Set pre run commands. It includes setting up dask cluster

set_sanity_patterns()[source]

Set sanity patterns. When RASCIL finishes the job successfully, it creates image files in fits format. We check if the files are created as sanity check

extract_times(var='create_blockvisibility_from_ms')[source]

Generic performance extraction function to extract time

extract_wall_time()[source]

Performance function to extract wall time. Sample output:

# 26/07/2021 05:41:22 PM.110 rascil-logger INFO Started  : 2021-07-26 13:52:07.487374
# 26/07/2021 05:41:22 PM.110 rascil-logger INFO Finished : 2021-07-26 17:41:22.110077
extract_imaging_time()[source]

Performance extraction function for imaging time

extract_processing_time()[source]

Performance extraction function for processing time. Sample output:

# 27/07/2021 08:57:15 PM.171 rascil-logger INFO Total processor time 1102.612 (s), total wallclock time 161.490 (s),
set_perf_variables()[source]

Set performance variables

set_reference_values()[source]

Set reference perf values