Installation example using schaap spack

This page explains the work to install the SKA low pipeline on the CSD3 cluster and get it working. This page is copied from https://confluence.skatelescope.org/display/SE/SKA+LOW+pipeline+on+CSD3

Step 1: Install schaap-spack

To install the ska low pipeline on CSD3 I used schaap-spack: https://git.astron.nl/RD/schaap-spack

mkdir schaap-spack
cd schaap-spack

git clone https://github.com/spack/spack.git
source ./spack/share/spack/setup-env.sh

git clone https://git.astron.nl/RD/schaap-spack.git
spack repo add ./schaap-spack

On CSD3 spack is already available, so to make sure our custom installation is used all the time add you the .bashrc the following line:

source /home/hpcsalv1/schaap-spack/spack/share/spack/setup-env.sh

Schaap software require the gcc compiler, which is available in multiple versions on CSD3. To use the right one we just have to load it:

module load gcc/11

afterwards we can use schaap spack to install the necessary software:

spack install python@3.9
spack install dp3@latest
spack install wsclean

Note 1: we also install python3.9 since it s needed by the pipeline, and on CSD3 we have until python 3.8 available

Note 2: It is important to have the latest version of DP3, since earlier versions caused unexpected crashes (still to be investigated)

Step 2: Install the pipeline

mkdir /home/hpcsalv1/ska_sdp_low_wflow
cd /home/hpcsalv1/ska_sdp_low_wflow

git clone --recursive git@gitlab.com:ska-telescope/sdp/science-pipeline-workflows/ska-sdp-wflow-low-selfcal.git
cd ska-sdp-wflow-low-selfcal

git submodule init
git submodule update

Create and activate a virtual environment

(in my case, the path to the python3.9 enviroment is ~/schaap-spack/spack/opt/spack/linux-centos7-cascadelake/gcc-11.2.0/python-3.9.18-kurwxlxz5e2timy2ja7jtzdkqmxaydqy/bin/python)

cd /home/hpcsalv1/ska_sdp_low_wflow

spack load python@3.9

virtualenv -p path_to_python_3.9 chiara_env
source chiara_env/bin/activate

Install poetry using pip

pip install poetry

Possible error: FileNotFoundError: [Errno 2] No such file or directory: ‘/tmp/pip-build-nmfhsumj/cryptography/setup.py’ → pip3 install –upgrade pip

Open poetry environment and install repo

poetry shell

Empty the PYTHONPATH variable before installation

export PYTHONPATH=""
pip install -e .

pip install mpi4py

to check your installation, one can run the tests with for example:

pytest tests/test_support_functions.py

Step 3: run on a single node

Now that everything is installed, each time you access CSD3 and want to run the pipeline you can follow these steps:

cd /home/hpcsalv1/ska_sdp_low_wflow/ska-sdp-wflow-low-selfcal
source ../chiara_env/bin/activate
poetry shell

module purge

spack load dp3@latest
spack load wsclean

Run, for example:

python src/ska_sdp_wflow_low_selfcal/pipeline/main.py \
--input_ms /home/hpcsalv1/rds/rds-sdhp-S7lLL7eOZIg/hpcsalv1/data/midbands_averaged.ms \
--work_dir /home/hpcsalv1/rds/rds-sdhp-S7lLL7eOZIg/hpcsalv1/workdir \
--imaging_taper_gaussian 0.004deg \
--imaging_size 5000 \
--imaging_scale 0.001658792 \
--calibration_nchannels 1 \
--run_distributed True \
--resume_from_operation calibrate_3

Possible errors I encountered:

No module named ‘numpy.core._multiarray_umath’ → solve with: export PYTHONPATH=”” no module named ska_sdp … → make sure you are using the python version installed in the virtual enviroment (which python, python version).

To force it to use the right one, use the full path to python in the command:

/home/hpcsalv1/ska_sdp_low_wflow/chiara_env/bin/python src/ska_sdp_wflow_low_selfcal/pipeline/main.py \
--input_ms /home/hpcsalv1/rds/rds-sdhp-S7lLL7eOZIg/hpcsalv1/data/midbands_averaged.ms \
--work_dir /home/hpcsalv1/rds/rds-sdhp-S7lLL7eOZIg/hpcsalv1/workdir \
--imaging_taper_gaussian 0.004deg \
--imaging_size 5000 \
--imaging_scale 0.001658792 \
--calibration_nchannels 1 \
--run_distributed True \
--resume_from_operation calibrate_3