Overview of Spack

Introduction

We use Spack to build the software stack locally to run the benchmark tests. Spack supports both Environment Modules and LMod. Here we provide some basic steps to install packages using Spack and integrating them using LMod. More details on integration of Spack with TCL and LMod can be found here.

Note

Some of the documentation provided by Spack on the integration of Spack with LMod is outdated and please be aware that one can run into issues while following Spack tutorials.

Installation

Installing Spack is trivial. It involves cloning the git repository and sourcing the environment.

cd ~
git clone https://github.com/spack/spack.git

This clones the repository in the home directory of the user. Next steps involve modifying the ~/.bashrc to source the environment.

export SPACK_ROOT=~/spack
. $SPACK_ROOT/share/spack/setup-env.sh

By sourcing this environment, all Spack commands will be available in the shell. This environment also adds module path to lmod after we install packages.

Usage

Basics

Some of the basic Spack commands are provided here. To list all the packages that Spack can install, we can use

spack list

If we want to search for a particular package, we can add the keyword to spack list command. For instance, to check if OpenMPI is available, we can query

spack list openmpi

To get more details for a given package, we can use spack info command.

spack info openmpi

This command gives all the info about the package, variants, dependencies, etc. To check the available versions of a given package, we can use spack versions <package name> command.

To install a Spack package, we can simply use spack install <package name> and similarly, to uninstall spack uninstall <package name>. To install a specific version, use spack install <package name>@<version>. The suffix @ specifies the version number here. More details on installing packages will be discussed in next sections.

Workflow

The general workflow is that we will use compilers provided on the platform to build a “standard” compiler tool chain and in-turn use this tool chain to build all the necessary packages. In this way, we will bring the software stack on different platforms to a common ground and it also helps us to compare the benchmarks on different hardware/architecture provided by different platforms using the same software stack.

First step is to provide a list of compilers that are available to the Spack. This can be done using

spack compiler list

This should list at least the compiler that is provided by the base OS. New compilers can be added to the list by loading appropriate module. For instance, if there is gcc-7.3.0 available on the module system, we can add it to Spack compiler list using

module load gcc-7.3.0
spack compiler find

This command finds the newly available compiler and adds it to the Spack compiler list.

The following step is to build a compiler tool chain using the system provided compiler. Going with the previous example, if system provided compiler is gcc-7.3.0 and we would like to build, say GCC 9.3.0, we should use following command

spack install gcc@9.3.0 %gcc-7.3.0

In the above command, we are telling Spack to build GCC version 9.3.0 by using @. The suffix % is used to specify the compiler toolchain to build the package. Here, we are telling Spack to build GCC 9.3.0 using GCC 7.3.0 that is provided on the system. We can add -j <number of jobs> to the install command to build the packages using concurrency.

Once the GCC 9.3.0 is built successfully, we should add it to the list of compilers of Spack. For that we can simply load the compiler first and add to the list.

spack load gcc
spack compiler find

More details on compiler configuration in Spack can be found in the ` compiler documentation <https://spack.readthedocs.io/en/latest/getting_started.html#spack-compiler-find>`_.

Installing packages

Now that we have a toolchain built, we would want to build necessary packages to run our codes. In this documentation, we will use OpenMPI as an example to demonstrate the process of building packages using Spack. Let’s say we want to build OpenMPI 3.1.3 version on our machine. We can query the specification of this package in the Spack using spack spec openmpi@3.1.3. This gives us an output as follows:

Input spec
--------------------------------
openmpi@3.1.3

Concretized
--------------------------------
openmpi@3.1.3%gcc@9.3.0~atomics~cuda~cxx~cxx_exceptions+gpfs~internal-hwloc~java~legacylaunchers~lustre~memchecker~pmi~singularity~sqlite3
+static~thread_multiple+vt+wrapper-rpath fabrics=none schedulers=none arch=linux-centos7-broadwell
^hwloc@1.11.13%gcc@9.3.0~cairo~cuda~gl~libudev+libxml2~netloc~nvml+pci+shared patches=d1d94a4af93486c88c70b79cd930979f3a2a2b5843708e8c7c1655f18b9fc694 arch=linux-centos7-broadwell
^libpciaccess@0.16%gcc@9.3.0 arch=linux-centos7-broadwell
^libtool@2.4.6%gcc@9.3.0 arch=linux-centos7-broadwell
^m4@1.4.19%gcc@9.3.0+sigsegv arch=linux-centos7-broadwell
^libsigsegv@2.13%gcc@9.3.0 arch=linux-centos7-broadwell
^pkgconf@1.7.4%gcc@9.3.0 arch=linux-centos7-broadwell
^util-macros@1.19.3%gcc@9.3.0 arch=linux-centos7-broadwell
^libxml2@2.9.10%gcc@9.3.0~python arch=linux-centos7-broadwell
^libiconv@1.16%gcc@9.3.0 arch=linux-centos7-broadwell
^xz@5.2.5%gcc@9.3.0~pic libs=shared,static arch=linux-centos7-broadwell
^zlib@1.2.11%gcc@9.3.0+optimize+pic+shared arch=linux-centos7-broadwell
^ncurses@6.2%gcc@9.3.0~symlinks+termlib abi=none arch=linux-centos7-broadwell
^numactl@2.0.14%gcc@9.3.0 patches=4e1d78cbbb85de625bad28705e748856033eaafab92a66dffd383a3d7e00cc94,62fc8a8bf7665a60e8f4c93ebbd535647cebf74198f7afafec4c085a8825c006 arch=linux-centos7-broadwell
^autoconf@2.69%gcc@9.3.0 arch=linux-centos7-broadwell
^perl@5.34.0%gcc@9.3.0+cpanm+shared+threads arch=linux-centos7-broadwell
^berkeley-db@18.1.40%gcc@9.3.0+cxx~docs+stl patches=b231fcc4d5cff05e5c3a4814f6a5af0e9a966428dc2176540d2c05aff41de522 arch=linux-centos7-broadwell
^bzip2@1.0.8%gcc@9.3.0~debug~pic+shared arch=linux-centos7-broadwell
^diffutils@3.7%gcc@9.3.0 arch=linux-centos7-broadwell
^gdbm@1.19%gcc@9.3.0 arch=linux-centos7-broadwell
^readline@8.1%gcc@9.3.0 arch=linux-centos7-broadwell
^automake@1.16.3%gcc@9.3.0 arch=linux-centos7-broadwell
^openssh@8.5p1%gcc@9.3.0 arch=linux-centos7-broadwell
^libedit@3.1-20210216%gcc@9.3.0 arch=linux-centos7-broadwell
^openssl@1.1.1k%gcc@9.3.0~docs+systemcerts arch=linux-centos7-broadwell

This says that the OpenMPI 3.1.3 will be built using GCC 9.3.0 (openmpi@3.1.3%gcc@9.3.0). The suffices ~ and + are used to specify the configuration options. The line openmpi@3.1.3%gcc@9.3.0~atomics~cuda~cxx~cxx_exceptions+gpfs ... fabrics=none schedulers=none tells us that OpenMPI will be build without the support of atomics, cuda, C++ bindinfs, HWLoc, Java, LUSTRE, etc. Similarly, by default the spec says that it will be build with GPFS support, static libraries, etc. The suffix ^ is used to sepcify the dependencies of the package. So, all the packages listed with ^ are dependencies of the OpenMPI and will be installed before installing OpenMPI. To get more details on what each variant mean, we can use spack info openmpi@3.1.3 command.

To install OpenMPI 3.1.3 using GCC 9.3.0 (that we have already built before) using IB verbs support and integrating with the workload scheduler of the system, we should use following command

spack install -j 32 openmpi@3.1.3 %gcc@9.3.0 fabrics=verbs schedulers=auto

The “a=b” portions specify the variants. In this case, we are telling Spack to build OpenMPI using IB verbs support for fabrics and by setting schedulers to auto, Spack will detect the workload manager and integrates it with OpenMPI. For instance, we want to build with multiple thread support, we can specify it using

spack install -j 32 openmpi@3.1.3+thread_multiple %gcc@9.3.0 fabrics=verbs schedulers=auto

All the default configuration options can be overridden during the installation using + and ~ suffices. More details can be found in the instllation documentation of the Spack.

Module files

Although we can load the modules using spack load <package name> command, it is preferable that Spack installed packages are integrated into the module tool. Here we will see how we can add Spack installed modules to the environment module tool.

If there is no module tool installed on the system, we should first install the module tool itself. We will use lmod in this example. We can install it using following command

spack install -j 32 lmod %gcc@9.3.0

After successful installation, we should source the lmod environment using following command

. $(spack location -i lmod)/lmod/lmod/init/bash

Here we are using spack location command to find the installation location of lmod. After this we should re-source the Spack . share/spack/setup-env.sh environment so that Spack modules will be put in the module path. Recall we added this line to ~/.bashrc script so that it will be sourced every time we start a shell.

Note

We should do these steps only if there is no module tool provided by the system. If there is already lmod installed on the system, we can skip these steps.

After sourcing, Spack environment, we should be able to see all the modules installed using Spack by querying module avail. This can be further verifed by looking at MODULEPATH environment variable, where we will see path to Spack installed packages.

Now we have the so-called Non-hierarchical module files, where all the modules that are all generated in the same root directory. Ideally, we would like to use hierarchical module files, where MODULEPATH is changed dynamically. In non-hierarchical module file system, it is easy to load incompatible module files at the same time. Using hierarchical module files, we can avoid situations like this by “unlocking” the dependent packages only after the parent packages are loaded. More details on this can be found at Spack tutorials.

The most widely used hierarchy is the so called Core/Compiler/MPI where, on top of the compilers, different MPI libraries also unlock software linked to them. In order to do this, we need to add a configuration file modules.yaml to the ~/.spack directory with the following contents

modules:
 default:
  enable::
    - lmod
  lmod:
    core_compilers:
     - gcc@7.3.0
    hierarchy:
    - mpi
    hash_length: 0
    projections:
      all: '{name}/{version}'

In this configuration, enable:: means telling Spack to use only lmod as the module system. We should add system provided compiler in the core_compilers section. By setting hash_length to zero, we will eliminate the hashes on the module names. The projections section tells spack on how to display module files. In this example, module files will be shown as openmpi/3.1.3.

Once we add this file to the home directory, we should regenerate the module files using

spack module lmod refresh --delete-tree -y

and then update MODULEPATH using

module unuse $HOME/spack/share/spack/modules/linux-centos7-broadwell
module use $HOME/spack/share/spack/lmod/linux-centos7-x86_64/Core

We should unuse the module path that is added everytime we source the Spack environment and then add new module path that points to the core modules. The above two lines can be added to the ~/.bashrc file. Now using module avail command gives an output as follows:

-------------------------------------------------- /alaska/mahendra/spack/share/spack/lmod/linux-centos7-x86_64/Core ---------------------------------------------------
gcc/9.3.0    gmp/6.2.1    mpc/1.1.0    mpfr/3.1.6    patch/2.7.6    zlib/1.2.11

----------------------------------------------------------------------- /opt/ohpc/pub/modulefiles -----------------------------------------------------------------------
gnu/5.4.0    gnu7/7.3.0    pmix/2.2.2    prun/1.3

Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".

We can see that OpenMPI that we installed does not appear in the list. This is due to the hierarchical module system we are using. Once we load GCC 9.3.0 using module load gcc/9.3.0, we will see bunch of other modules available to use.

------------------------------------------------ /alaska/mahendra/spack/share/spack/lmod/linux-centos7-x86_64/gcc/9.3.0 -------------------------------------------------
autoconf/2.69          fftw/3.3.9                       intel-oneapi-tbb/2021.3.0    libsigsegv/2.13        openssl/1.1.1k          tar/1.34
automake/1.16.3        findutils/4.8.0                  libbsd/0.11.3                libtool/2.4.6          perl/5.34.0             ucx/1.10.1
berkeley-db/18.1.40    flex/2.6.3                       libedit/3.1-20210216         libxml2/2.9.10         pkgconf/1.7.4           util-linux-uuid/2.36.2
bison/3.7.6            gdbm/1.19                        libevent/2.1.12              m4/1.4.19              py-docutils/0.15.2      util-macros/1.19.3
bzip2/1.0.8            gettext/0.21                     libffi/3.3                   ncurses/6.2            py-setuptools/50.3.2    xz/5.2.5
cmake/3.20.5           hdf5/1.10.7                      libiconv/1.16                numactl/2.0.14         python/3.8.11           zlib/1.2.11            (D)
cpio/2.13              hwloc/1.11.13                    libmd/1.0.3                  openmpi/3.1.3          rdma-core/34.0
diffutils/3.7          hwloc/2.5.0               (D)    libnl/3.3.0                  openmpi/4.1.1   (D)    readline/8.1
expat/2.4.1            intel-oneapi-mkl/2021.3.0        libpciaccess/0.16            openssh/8.5p1          sqlite/3.35.5

--------------------------------------------------- /alaska/mahendra/spack/share/spack/lmod/linux-centos7-x86_64/Core ---------------------------------------------------
gcc/9.3.0 (L)    gmp/6.2.1    mpc/1.1.0    mpfr/3.1.6    patch/2.7.6    zlib/1.2.11

----------------------------------------------------------------------- /opt/ohpc/pub/modulefiles -----------------------------------------------------------------------
gnu/5.4.0    gnu7/7.3.0    pmix/2.2.2    prun/1.3

Where:
D:  Default Module
L:  Module is loaded

Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".

More details on Spack can be gathered from the extensive documentation of the project. Only basic use case of Spack is covered here and for more advanced use cases, please refer to the documentation.

Setting up environment for SKA SDP Benchmark tests

This can be done in two different ways. The first and less recommended way is to use the legacy bash script provided in spack/scripts folder and choose the appropriate CLI options to install the required packages on the platform. The reproducibility with this approach is not guaranteed, especially when there are changes in the upstream Spack packages.

The second approach which is strongly recommended is to use Spack environment spec files provided for each system in the spack/spack-tests folder. Inside the folder we typically find Spack config files of different systems and a ReFrame test to deploy the software stack using those Spack config files. Therefore, these Spack tests are nothing but “meta tests” that need to run before running actual benchmark tests to deploy the necessary software stack.

Both approaches are illustrated in the following sections.

Config based approach

This approach is based on Spack environments which are which are collection of set of packages. A more detailed description of Spack environments is out-of-scope of present context. Spack environments can be defined by Spack configuration files. For each system, typically we will find five different configuration files namely,

  • compilers.yml: All compiler related information is placed in this file

  • config.yml: General configuration of Spack can be defined here

  • modules.yml: We use LMod environment files and related configuration is defined in this file

  • packages.yml: A list of package preferences are defined here

  • spack.yml: This is the main file that includes list of packages to install

All these files together define a Spack environment spec. The packages.yml will list all the root packages and their dependencies along with the preferred version and variants. A simple example is shown below:

packages:
  hdf5:
    variants: ~cxx ~fortran ~hl ~ipo ~java +mpi +shared ~szip ~threadsafe +tools api=default build_type=RelWithDebInfo
    version:
      - 1.10.7

This config tells Spack that HDF5 preferred version is 1.10.7 with MPI support. So, when we use this file to define Spack environment, Spack will always “try” to build HDF5 with the configuration shown above.

Note

Depending on the complexity of the environment (set of all packages to be installed), Spack may not respect the package spec defined in packages.yml file. In order to really constraint a package to certain spec, we need to define that under spec in spack.yml file.

Under spec section in spack.yml file, there will be set of packages to be installed in the environment. Thus, as long as we use same config files, we can always deploy the same software stack on same system or even on different systems. This gives us a great deal of reproducibility within and between systems.

A typical workflow in Spack environment is:

  • Create an named environment using spack.yml file and activate it

  • Concretize the environment

  • Install the packages

  • Generate module files

All the generated module files are placed under $SPACK_ROOT/var/spack/environments/<env_name>/lmod/<arch>/Core. Once we add this path to MODULEPATH, we can use the Spack packages using module load commands.

All these steps are abstracted away from the user by employing ReFrame test to automate this workflow. A separate ReFrame test is defined for each system/partition where the name of the environment is defined using the partition name.

Let’s look into an example. If we want to install all the necessary packages on JUWELS cluster, we need to execute following commands.

cd ska-sdp-benchmark-tests
conda activate ska-sdp-benchmark-tests
reframe/bin/reframe -C reframe_config.py -c spack/spack-tests/juwels/cluster/reframe_juwelscluster.py --run

Important

The user needs to replace the variable spack_root in the test file to point to user specific path. This can be done either by directly editing the test file or at the CLI using -S option, e.g., reframe/bin/reframe -C reframe_config.py -c spack/spack-tests/juwels/cluster/reframe_juwelscluster.py -S spack_root=<my_spack_root_path> --run.

This test will clone Spack repository v0.17.0, creates an environment, install the the packages defined in spack/spack-tests/juwels/cluster/configs/spack.yml file and creates module files. Depending on the IO performance of the file system where Spack installation is happening, it might take 3 - 4 hrs to install all the packages. So, if the test is taking long time to finish, it is normal. However, this is done only once on a given system, generally on login nodes, before running actual tests.

Tip

Do not update module path using module use <path> command when there are multiple clusters with different micro architectures sharing common frontend. This can trigger module conflicts as modules generated for different micro architectures will have same name. For example, in the case of JUWELS supercomputer, cluster partition has Intel Skylake nodes whereas booster partition has AMD nodes. Packages will be built for Intel and AMD architectures separately and so if we have both module paths on MODULEPATH, a simple command like module load gcc/9.3.0 will trigger conflicts or unintended behaviour as module system do not know which module to load. Module path is updated for each partition within benchmark tests to isolate the modules for that partition.

Important

The test will add environment variable SPACK_ROOT to the user’s $HOME/.bashrc, if found. If bash is not the default shell of the user, it is essential to add SPACK_ROOT env variable to the profile. We use this variable inside the system configuration to add module path to the module system.

Script based approach

A bootstrap script is provided in spack/build-spack-modules.bash to build the entire environment to run SDP benchmark tests. The available options for the bootstrap are as follows:

Log is saved to: $REPO_ROOT/spack/spack_install.log
Usage: build-spack-modules.bash [OPTIONS]

OPTIONS:

-d | --dry-run              dry run mode
-s | --with-scheduler       Built OpenMPI with scheduler support
-i | --with-ib              Built UCX with IB transports
-o | --with-opa             Built OpenMPI with psm2 support (Optimized for Intel Omni-path)
-c | --with-cuda            Build OpenMPI and UCX with cuda and other cuda related packages
-t | --use-tcl              Use tcl module files [Default is lmod]
-h | --help                 prints this help and exits

The script can be run in dry mode to see the commands it will execute on the current system. If the system supports Infiniband, -s flag must be passed on CLI to build OpenMPI with IB support. Similarly, if the system has Intel Omni-Path (OPA), -o must be passed. It is not possible to use both -i and -o at the same time as Spack supports only one fabrics type at a time.

Similarly, if there is a batch scheduler support on the system, use -s flag to enable scheduler support for OpenMPI. By default, the bootstrap script will build hierarchical modules to be used with lmod module system. If the platform supports only tcl module system, it must be passed at CLI using -t flag.

Finally, all the packages are installed using GCC 9.3.0 as compiler toolchain. This default can be overridden by setting environment variable TOOLCHAIN which will take precedence over default value. Similarly, the versions of all packages can be controlled by setting environment variables. More details can be found in the README file in spack/ folder in the root of the repository.

An example usage for a system that has scheduler and IB supports would be:

cd spack
./build-spack-modules.bash -i -s -d  # For dry run
./build-spack-modules.bash -i -s  # For installing packages

Once the bootstrapping is finished, assuming all packages have installed without any errors, we need to source $HOME/.bashrc file as a final step to use the module files.

Deploy Spack environments

The following documentation shows how to create Spack environments and install packages for different systems using ReFrame tests.