Dynamic content using jupyter notebooks

Jupyter notebooks are a literate-programming format that allows text and runnable code to be combined in a single document. They provide the ability to write documentation pages that show the actual use of the virtual_ecosystem project along with outputs and figures. They are also an invaluable tool for sharing design and troubleshooting investigations. The Jupyter project provides many different tools for working with notebooks, including the main jupyter program and a browser-based notebook editor called jupyter-lab.

Running jupyter-lab

The poetry virtual environment for virtual_ecosystem is already setup to include jupyter and jupyter-lab, which is a browser-based application for editing and running notebooks. As that virtual environment also has the virtual_ecosystem package installed in development mode, a jupyter notebook running using this enviroment will be able to import and use virtual_ecosystem code from the active branch.

You can open jupyter-lab in a couple of ways. The simplest way is to use poetry run jupyter-lab from the terminal, but you can also open the notebook within VS Code use the Jupyter extension within VS Code. For this option, you will need to make sure that VS Code is using the right python environment. The information you will need is produced from poetry:

% poetry env list --full-path
/Users/dorme/Library/Caches/pypoetry/virtualenvs/virtual-ecosystem-Laomc1u4-py3.10
/Users/dorme/Library/Caches/pypoetry/virtualenvs/virtual-ecosystem-Laomc1u4-py3.9 (Activated)

In VS Code, you then have to set the Python interpreter to the full path to the currently active poetry virtual environment:

  • View > Command Palette

  • Type interpreter and find ‘Python: Select Interpreter’

  • Enter the full path from the poetry env list output.

Jupyter kernel setup

The jupyter system can be setup to run notebooks in a number of different languages and even different environments of the same language. Each option is setup as a kernel, which is basically a pointer to a particular programming environment or virtual environment.

To make sure that virtual_ecosystem project notebooks are always built using the correct virtual environment on all systems (including developer machines, ReadTheDocs and Github Actions), this project requires that jupyter is set up to use the virtual environment created by poetry under the vr_python3 kernel name. There is a good discussion of the background for this here.

In order to install that kernel, run the following line:

poetry run python -m ipykernel install --user --name=vr_python3

When you run jupyter-lab now, you should be able to select the vr_python3 kernel to run the code cells. That command is doing some subtle and important things:

  • Python is being run in the active poetry virtual environment (poetry run).

  • The active python environment is then being installed as a kernel specification.

  • It is being installed into a location that is available for the user from anywhere they run jupyter (--user).

  • It is being installed with the name vr_python3 (--name vr_python3).

The choice of kernel name is important because jupyter uses the kernel specified in the notebook metadata and we want it to be stable. The kernel name:

  • needs to point to a virtual environment including the virtual_ecosystem package and dependencies, and

  • should be consistent across supported Python versions and developer machines.

The options are:

  • By default, it would be installed as python3, which is way too generic.

  • The poetry venv name contains a hash (e.g. Laomc1u4) which uniquely identifies the project directory and helps poetry track the project-specific venvs. This is a spectacularly bad kernel name because files would change as they are run on different developer machines.

  • Using the vr_python3 name is hopefully unique and should be a stable pointer to a venv that includes the virtual_ecosystem package and dependencies.

Just to point to the gory details, there is now a kernelspec called vr_python3. That is just a pointer to a JSON file that points to the machine-specific venv location.

% jupyter kernelspec list
Available kernels:
  ir            /Users/dorme/Library/Jupyter/kernels/ir
  julia-1.0     /Users/dorme/Library/Jupyter/kernels/julia-1.0
  vr_python3    /Users/dorme/Library/Jupyter/kernels/vr_python3
% cat /Users/dorme/Library/Jupyter/kernels/vr_python3/kernel.json
{
 "argv": [
  "/Users/dorme/Library/Caches/pypoetry/virtualenvs/virtual-ecosystem-Laomc1u4-py3.10/bin/python",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "vr_python3",
 "language": "python",
 "metadata": {
  "debugger": true
 }
}%

Notebook formats

The default jupyter notebook format is the IPython Notebook (.ipynb suffix). This file uses the JSON format to store the text and code and a whole bunch of other metadata. However, the .ipynb format is not great for use in version control. The basic problem is that - although JSON files are text-based and are technically human-readable:

  • they contain irrelevant metadata - such as the number of times the notebook has been run - that will generate unneccessary commits.

  • they can contain output binary data - such as images - that may also have arbitrary changes.

There is a really neat summary of the problem here, along with a discussion of tools (e.g. nbdime and nbmerge) that help manage those changes in a more coherent way.

However, a simpler solution is to use plain text instead of JSON: we use notebooks written in the plain text MyST Markdown format. The jupytext extension then allows jupyter to load and run those files as notebooks. More broadly, jupytext is a really powerful tool for managing the content of Jupyter notebooks, including using markdown formats for notebooks.

Using jupytext

The jupytext package works as an extension running within Jupyter Lab, adding some commands to the jupyter-lab command palette, but also provides a command line tool with some really useful features.

To be used with jupytext, MyST Markdown files need to include a YAML preamble at the very top of the file. This is used to set document metadata about the Markdown variety and also code execution data like the jupyter kernel. This is where the vr_python3 kernel name is set.

---
jupytext:
  cell_metadata_filter: -all
  formats: md:myst
  main_language: python
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.13.8
kernelspec:
  display_name: vr_python3
  language: python
  name: vr_python3
---

If you already have a simple Markdown file then the commands below will insert this YAML header:

% jupytext --set-format md:myst simple.md
% jupytext --set-kernel vr_python3  simple.md

There is a downside to using Markdown notebooks. The .ipynb format includes the results of executing the notebook code, including Python code outputs and any graphics created in the code. GitHub knows how to render those outputs, so the page you see on GitHub includes the most recently committed code and graphics outputs. These outputs are not stored in Myst Markdown notebooks, so you only see the text and input code on GitHub.

In summary:

  • We only commit notebooks in MyST Markdown format

  • Notebooks should use the vr_python3 kernel, so that they will hopefully run on any machine that has set up the kernelspec correctly.

  • GitHub will render the markdown and code cells correctly but none of the executed outputs will be shown.

  • However, the notebooks will be executed by the sphinx documentation system, so fully rendered versions will be in the documentation website.

  • You can develop notebook content locally using jupyter-lab and run it to get outputs. You can also run sphinx to see how a notebook is rendered in the documentation.

  • The code in notebooks should not take a long time to run - these pages have to be built every time the documentation is built.

Notebook quality checking

All Myst Markdown content in a notebook will be checked using markdownlint when the file is committed to GitHub (see here). In addition, the following tools may be useful:

Using black with jupytext

Although jupytext does not do Markdown validation, it does allow black to be run on the code cells, so that the format of code in notebooks can be automatically formatted.

jupytext --pipe black my_markdown.md

Note that this does not format Python code that is simply included in a Markdown cell - essentially text that is formatted as if it were Python code. It only formats code within a Jupyter notebook {code-cell} or {code-block} section.

The mdformat tool

Warning

The following tool is essentially black for Markdown files, which is great. At the moment, although it handles MyST Markdown, it has not been extended to include some extensions to MyST which we use. As a result, it can introduce errors. In the future, we may be able to configure it to automatically tidy Markdown content.

This is an autoformatter for Markdown, with specific extensions to handle the Myst Markdown variety and the YAML frontmatter (mdformat-myst and mdformat-frontmatter). It is configured using .mdformat.toml, to set up line wrapping length and default list formatting.

mdformat my_markdown.md