Framework philosophy

Adding new test

To add a new test to the benchmark suite, follow the following steps: 1. Define whether the test belongs into level 0, level 1 or level 2. Then create a folder in the corresponding location and add the following files:

  • reframe_<test_name>.py: This is the main test file where we define the test class derived from ReFrame Regression Test class

  • TEST_NAME.ipynb: Jupyter notebook to plot the performance metrics derived from the test

  • README.md: A simple readme file that gives high level instructions on where to find the documentation of the test.

  1. Define the test procedure:

    • Does the test need some sources or packages from the internet, be it its own sources, python packages or any other dependencies? If yes, create a test dependency that fetches everything:

    class IdgTestDownload(FetchSourcesBase):
        """Fixture to fetch IDG source code"""
        
        descr = 'Fetch source code of IDG'
        sourcesdir = 'https://git.astron.nl/RD/idg.git'
        cnd_env_name = CONDA_ENV_NAME
    
    • Does the test need to compile fetched dependencies? If yes, create a test dependency that builds the sources. If the sources are fetched in a previous test, be sure to include this as a dependent fixture: app_src = fixture(DownloadTest, scope='session').

    class IdgTestBuild(rfm.CompileOnlyRegressionTest):
        """IDG test compile test"""
        
        descr = 'Compile IDG test from sources'
        
        # Share resource from fixture
        idg_test_src = fixture(IdgTestDownload, scope='session')
    
        def __init__(self):
            self.valid_prog_environs = [
                'idg-test',
            ]
            self.valid_systems = filter_systems_by_env(self.valid_prog_environs)
            self.maintainers = [
                'Mahendra Paipuri (mahendra.paipuri@inria.fr)'
            ]
            # Cross compilation is not possible on certain g5k clusters. We force
            # the job to be non-local so building will be on remote node
            if 'g5k' in self.current_system.name:
                self.build_locally = False
        
        @run_before('compile')
        def set_sourcedir(self):
            """Set source path based on dependencies"""
            self.sourcesdir = self.idg_test_src.stagedir
            
        @run_before('compile')
        def set_prebuild_cmds(self):
            """Make local lib dirs"""
            self.lib_dir = os.path.join(self.stagedir, 'local')
            self.prebuild_cmds = [
                f'mkdir -p {self.lib_dir}',
            ]
    
        @run_before('compile')
        def set_build_system_attrs(self):
            """Set build directory and config options"""
            self.build_system = 'CMake'
            self.build_system.builddir = os.path.join(self.stagedir, 'build')
            self.build_system.config_opts = [
                f'-DCMAKE_INSTALL_PREFIX={self.lib_dir}',
                '-DBUILD_LIB_CUDA=ON',
                '-DPERFORMANCE_REPORT=ON',
            ]
            self.build_system.max_concurrency = 8
            
        @run_before('compile')
        def set_postbuild_cmds(self):
            """Install libs"""
            self.postbuild_cmds = [
                'make install',
            ]
    
        @run_before('sanity')
        def set_sanity_patterns(self):
            """Set sanity patterns"""
            self.sanity_patterns = sn.assert_not_found('error', self.stderr)
    
  2. Write the test itself.

    • Define all dependencies as fixture, all parameters as parameter and all variables as variable. Tests are run for all permutations of parameters, whereas variables can define specific behaviour for a single run (like number of nodes).

    • Set the valid_prog_environs and the valid_systems in the __init__ method.

    • Define the executable and executable options.

    • Define the Sanity Patterns. You can define which patterns must and must not appear in the stdout and stderr.

        @run_before('sanity')
        def set_sanity_patterns(self):
            """Set sanity patterns. Example stdout:
    
            .. code-block:: text
    
                # Fri Jul 23 15:15:41 2021[1,0]<stdout>:Operations:
    
            We check number of time the above line is printed and compare it with number of
            sub grid workers
            """
            num_messages = sn.len(sn.findall(r'(.*):(\s*)Operations', self.stdout))
            self.sanity_patterns = sn.assert_eq(
                num_messages,
                self.num_nodes * self.num_sm[self.variant] *
                self.benchmark[self.size]['subgrid-workers']
            )
    
    • Define the Performance Functions. To extract data from the output stream it is necessary to extract them using regular expressions.

        @run_before('sanity')
        def set_sanity_patterns(self):
            """Set sanity patterns. Example stdout:
    
            .. code-block:: text
    
                >>> Total runtime
                gridding:   6.5067e+02 s
                degridding: 1.0607e+03 s
                fft:        3.5437e-01 s
                get_image:  6.5767e+00 s
                imaging:    2.0073e+03 s
    
                >>> Total throughput
                gridding:   3.12 Mvisibilities/s
                degridding: 1.91 Mvisibilities/s
                imaging:    1.01 Mvisibilities/s
    
            """
            self.sanity_patterns = sn.all([
                sn.assert_found('Total runtime', self.stderr),
                sn.assert_found('Total throughput', self.stderr),
            ])
    
        @performance_function('s')
        def extract_time(self, kind='gridding'):
            """Performance extraction function for time. Sample stdout:
    
    
            .. code-block:: text
    
                >>> Total runtime
                gridding:   7.5473e+02 s
                degridding: 1.1090e+03 s
                fft:        3.5368e-01 s
                get_image:  7.2816e+00 s
                imaging:    1.8899e+03 s
                
            """
            return sn.extractsingle(rf'^{kind}:\s+(?P<value>\S+) s', self.stderr, 'value', float)
        
        @performance_function('Mvisibilities/s')
        def extract_vis_thpt(self, kind='gridding'):
            """Performance extraction function for visibility throughput. Sample stdout:
    
    
            .. code-block:: text
    
                >>> Total throughput
                gridding:   2.69 Mvisibilities/s
                degridding: 1.83 Mvisibilities/s
                imaging:    1.07 Mvisibilities/s
    
            """
            return sn.extractsingle(rf'^{kind}:\s+(?P<value>\S+) Mvisibilities/s', self.stderr, 'value', float)
    
        @run_before('performance')
        def set_perf_patterns(self):
            """Set performance variables"""
            self.perf_variables = {
                'gridding s': self.extract_time(),
                'degridding s': self.extract_time(kind='degridding'),
                'fft s': self.extract_time(kind='fft'),
                'get_image s': self.extract_time(kind='get_image'),
                'imaging s': self.extract_time(kind='imaging'),
                'gridding Mvis/s': self.extract_vis_thpt(),
                'degridding Mvis/s': self.extract_vis_thpt(kind='degridding'),
                'imaging Mvis/s': self.extract_vis_thpt(kind='imaging'),
            }
    
        @run_before('performance')
        def set_reference_values(self):
            """Set reference perf values"""
            self.reference = {
                '*': {
                    '*': (None, None, None, 's'),
                    '*': (None, None, None, 'Mvis/s'),
                }
            }
    

The sanity- and performance functions are both based on the concept of “Deferrable Functions”. Be sure to check out the official documentation on how to use them properly.

Those steps allow you to write a basic ReFrame test. For more in-detail view, take a look at the ReFrame documentation. There is no strict convention on how to name the test. Already provided tests can be used as templates to write new tests. The idea is to provide an environment for a given test and define all the test related variables like modules to load, environment variables to define within this environment. Also, we need to add the target_systems to this environments on the systems that we would like to run these tests. The details of adding a new environment and system are presented below.

Adding new system

Every time we want to add a new system, typically we will need to follow these steps:

  • Create a new python file <system_name>.py in config/systems folder.

  • Add system configuration and define partitions for the system. More details on how to define a partition and naming conventions are presented later.

  • Import this file into reframe_config.py and add this new system in the site_configuration.

  • The final step would be get the processor info using --detect-host-topology option on ReFrame of system nodes, place in the toplogies folder and include the file in processor key for each partition.

The user is advised to consult the ReFrame documentation before doing so. The provided systems can be used as a template to add new systems.

We try to follow a certain convention in defining the system partition. Firstly, we define partitions, either physical or abstract, based on compiler toolchain and MPI implementation such that when we use this system, modules related to compiler and MPI will be loaded. Rest of the modules that are related to test will be added to the environs which will be discussed later. Consequently, we should also name these partitions in such a way that we can have a standard scheme. The benefit of having such a scheme is two-fold: able to get high level overview of partition quickly and by choosing an appropriate names, we can filter the systems for the tests easily. An example use case is that we want to run a certain test on all partitions that support GPUs. Using a partition name with gpu as suffix, we can simply filter all the partitions looking for a match with string gpu.

We use the convention {prefix}-{compiler-name-major-ver}-{mpi-name-major-ver}-{interconnect-type}-{software-type}-{suffix}.

  • Prefix can be name of the partition or cluster.

  • compiler-name-major-ver can be as follows:
    • gcc9: GNU compiler toolchain with major version 9

    • icc20: Intel compiler toolchain with major version 2020

    • xl16: IBM XL toolchain with major version 16

    • aocc3: AMD AOCC toolchain with major version 3

  • mpi-name-major-ver is the name of the MPI implementation. Some of them are:
    • ompi4: OpenMPI with major version 4

    • impi19: Intel MPI with major version 2019

    • pmpi5: IBM Spectrum MPI with major version 5

    • smpi10: IBM Spectrum MPI with major version 10

  • interconnect-type is type of interconnect on the partition.
    • ib: Infiniband

    • rocm: RoCE

    • opa: Intel Omnipath

    • eth: Ethernet TCP

  • software-type is type of software stack used.
    • smod: System provided software stack

    • umod: User built software stack using Spack

  • suffix can indicate any special properties of the partitions like gpu, high memory nodes, high priority job queues, etc. There can be multiple suffices each separated by a hyphen.

Important

If the package uses calendar versioning, we use only last two digits of the year in the name to be concise. For example, Intel MPI 2019.* would be impi19.

For instance, in the configuration shown in ReFrame configuration compute-gcc9-ompi4-roce-umod tells us that the partition has GCC compiler with OpenMPI. It uses RoCE as interconnect and the softwares are built in user space using Spack.

Important

It is recommended to stick to this convention and there can be more possibilities for each category which should be added as we add new systems.

Adding new environment

Adding a new system is not enough to run the tests on this system. We need to tell our ReFrame tests that there is a new system available in the config. In order to minimise the redundancy in adding configuration details and avoid modifying the source code of the test, we choose to provide a environ for each test. For example, there is HPL test in apps/level0/hpl folder and for this test we define a environ in config/environs/hpl.py.

Note

System partitions and environments should have one-to-one mapping. It means, whatever environment we define within environs section in the system partition, we should put that partition within target_systems in each environ.

All the modules that are needed to run the test, albeit compiler and MPI, will be added to the modules section in each environ. For example, lets take a look at hpl.py file

""""This file contains the environment config for HPL benchmark"""


hpl_environ = [
    {
        'name': 'intel-hpl',
        'cc': 'mpicc',
        'cxx': 'mpicxx',
        'ftn': 'mpif90',
        'modules': [
            'intel-oneapi-mkl/2021.3.0',
        ],
        'variables':[
            ['XHPL_BIN', '$MKLROOT/benchmarks/mp_linpack/xhpl_intel64_dynamic'],
        ],
        'target_systems': [
            'alaska:compute-icc21-impi21-roce-umod',
            # <end - alaska partitions>
            'grenoble-g5k:dahu-icc21-impi21-opa-umod',
            # <end - grenoble-g5k partitions>
            'juwels-cluster:batch-icc21-impi21-ib-umod',
            # <end juwels partitions>
            'nancy-g5k:gros-icc21-impi21-eth-umod',
            # <end - nancy-g5k partitions>
            'cscs-daint:daint-icc21-impi21-ib-umod-gpu',
            # <end - cscs partitions>
        ],
    },
    {
        'name': 'gnu-hpl',
        'cc': '',
        'cxx': '',
        'ftn': '',
        'modules': [
            'amdblis/3.0',
        ],
        # 'variables': [
        #     ['UCX_TLS', 'ud,rc,dc,self']
        # ],
        'target_systems': [
            'juwels-booster:booster-gcc9-ompi4-ib-umod',
            # <end juwels partitions>
        ],
    },
]

There are two different environments namely intel-hpl and gnu-hpl. As names suggests, intel-hpl uses HPL benchmark shipped out of MKL optimized for Intel chips. Whereas we use gnu-hpl for other chips like AMD using GNU toolchain. Notice that target_systems for intel-hpl has only partitions that have Intel MPI implementation (impi in the name) whereas the gnu-hpl has target_systems have OpenMPI implementation. Within the test, we define only the valid program and find valid systems by filtering all systems that have the given environment defined for them.

For instance, we defined a new system partition that has Intel chip with name as mycluster-gcc-impi-ib-umod. If we want HPL test to run on this system partition, we add intel-hpl to environs section in system partition and similarly add the name of this partition to target_systems in intel-hpl environment. Once we do that, the test will run on this partition without having to modify anything in the source code of the test.

If we want to add a new test, we will need to add new environment and following steps should be followed:

  • Create a new file <env_name>.py in config/environs folder and add environment configuration for the tests. It is important that we add this new environment to existing and/or new system partitions that are defined in target_systems of the environment configuration.

  • Finally, import <env_name>.py in the main ReFrame configuration file reframe_config.py and add it to the configuration dictionary.

Adding Spack configuration test

After we define a new system, we need software stack on this system to be able to run tests on it. If the user chooses to use platform provided software stack, this step can be skipped. We need to define Spack config files in order to deploy the software stack. We can user existing config files provided for different systems as a base. Typically, we should only change compilers.yml, modules.yml and spack.yml files for new system. We need to update the system compiler version and their paths in compilers.yml and also in modules.yml file in core_compilers section. Similarly, the desired software stack that will be installed on the system is defined in spack.yml file.

Once these configuration files are ready, we need to create a new folder in spack/spack_tests folder with name of the system and place all configuration files in configs/ and define a ReFrame test to deploy this software stack. The user can use the existing test files as template. The ReFrame test file per se is very minimal and user needs to put the name of the cluster and path where Spack must be installed in the test body.