Jupyter Notebook Coding Guidelines
Best Practices
Jupyter notebooks are widely used across the SKAO by a number of teams - mostly within Binderhub/Jupyterhub - to share ideas, code snippets and break down larger concepts into more manageable chunks. With the large number of notebooks produced it is very important to address the best practices of creating notebooks so that a standard approach can be utilised across the SKAO.
SKAO Jupyter Notebook Standard |
Description |
---|---|
Naming |
Use expressive names that describe what your notebook is doing. |
Directory Structure |
Notebooks should be placed inside a /notebooks directory, at the root of your repository. |
Execution |
Avoid ambiguous execution orders. To ensure that your notebook is reproducible and creates the expected results, restart the kernel and execute all cells of the notebook before you share it. |
Modularisation |
Use modularisation (i.e. modules, functions, classes) if reasonable. |
Testing |
Use the testing make target described below. |
Linting |
Use the linting make target described below. |
Data Distribution |
Ensure that all data used in the notebook is distributed together with it (or at least can be downloaded) and that you’re using relative paths to access the data. |
Dependencies |
Ensure you are referencing the dependencies using .TOML for example to pin the versions of all used dependencies and import all dependencies at the beginning of a notebook. |
Outputs |
Distribute a notebook with its outputs. This makes it easier to reproduce the results as everyone who executes the notebook can verify that the results are the same. |
Variables |
Do not redefine variables in different parts of the notebook. |
Pipeline Machinery
The CICD makefile repository contains make targets for the linting, formatting and testing of notebooks, all found here. To access these new targets, ensure your repository’s Makefile
includes the python support makefile:
# Include Python support
include .make/python.mk
Below describes the usage of these targets.
Testing
To test notebooks, run:
make notebook-test
This target uses Pytest and nbmake, which is a Pytest plugin. It verifies Jupyter notebooks can execute fully without error, the target execution fails if an error occurs. By default, all notebooks inside the repository will be tested.
CICD Template
Linting and testing of Jupyter notebooks is currently supported within CICD pipelines. You must include notebook.gitlab-ci in your repository’s gitlab-ci file to enable jobs for the linting and testing of notebooks, as below:
include:
# Jupyter notebook linting and testing
- project: 'ska-telescope/templates-repository'
file: 'gitlab-ci/includes/notebook.gitlab-ci.yml'
Customising
If you wish to exclude specific notebooks from being targetted by any of the above make targets, simply include the names of them in the NOTEBOOK_IGNORE_FILES
environment variable. Define this in your repository’s Makefile
, ensuring it appears before you include python.mk
. It must follow the form not <file1> and not <file2>...
:
NOTEBOOK_IGNORE_FILES = not notebook.ipynb and not another-notebook.ipynb
# Include Python support
include .make/python.mk