Jupyter Notebook Coding Guidelines

Best Practices

Jupyter notebooks are widely used across the SKAO by a number of teams - mostly within Binderhub/Jupyterhub - to share ideas, code snippets and break down larger concepts into more manageable chunks. With the large number of notebooks produced it is very important to address the best practices of creating notebooks so that a standard approach can be utilised across the SKAO.

SKAO Jupyter Notebook Standard

Description

Naming

Use expressive names that describe what your notebook is doing.

Directory Structure

Notebooks should be placed inside a /notebooks directory, at the root of your repository.

Execution

Avoid ambiguous execution orders. To ensure that your notebook is reproducible and creates the expected results, restart the kernel and execute all cells of the notebook before you share it.

Modularisation

Use modularisation (i.e. modules, functions, classes) if reasonable.

Testing

Use the testing make target described below.

Linting

Use the linting make target described below.

Data Distribution

Ensure that all data used in the notebook is distributed together with it (or at least can be downloaded) and that you’re using relative paths to access the data.

Dependencies

Ensure you are referencing the dependencies using .TOML for example to pin the versions of all used dependencies and import all dependencies at the beginning of a notebook.

Outputs

Distribute a notebook with its outputs. This makes it easier to reproduce the results as everyone who executes the notebook can verify that the results are the same.

Variables

Do not redefine variables in different parts of the notebook.

Pipeline Machinery

The CICD makefile repository contains make targets for the linting, formatting and testing of notebooks, all found here. To access these new targets, ensure your repository’s Makefile includes the python support makefile:

# Include Python support
include .make/python.mk

Below describes the usage of these targets.

Testing

To test notebooks, run:

make notebook-test

This target uses Pytest and nbmake, which is a Pytest plugin. It verifies Jupyter notebooks can execute fully without error, the target execution fails if an error occurs. By default, all notebooks inside the repository will be tested.

CICD Template

Linting and testing of Jupyter notebooks is currently supported within CICD pipelines. You must include notebook.gitlab-ci in your repository’s gitlab-ci file to enable jobs for the linting and testing of notebooks, as below:

include:
# Jupyter notebook linting and testing
- project: 'ska-telescope/templates-repository'
  file: 'gitlab-ci/includes/notebook.gitlab-ci.yml'

Customising

If you wish to exclude specific notebooks from being targetted by any of the above make targets, simply include the names of them in the NOTEBOOK_IGNORE_FILES environment variable. Define this in your repository’s Makefile, ensuring it appears before you include python.mk. It must follow the form not <file1> and not <file2>...:

NOTEBOOK_IGNORE_FILES = not notebook.ipynb and not another-notebook.ipynb

# Include Python support
include .make/python.mk