Dependency management with Conda on Jupyter

When using Datalabs the recommended way to install packages is through use of Conda. This is a flexible package/library management tool that can be used to install dependencies across multiple languages and platforms (for more information see https://docs.conda.io/en/latest/).

One of the key advantages of Conda is allowing dependencies (including, but not limited packages, binaries & libraries) to be captured alongside project code. It is the default package manager for Jupyter Notebooks. Conda Environments utilize Conda to allow users to setup isolated sets of dependencies. This offers numerous advantages, but practically for DataLabs this allows dependencies to be used within multiple notebooks and persisted when notebooks are rescheduled across the Kubernetes cluster.

Quick-start guide

In order for Conda environments to be persisted within DataLabs, they must be stored on the /data mount point which is shared among notebooks of the same project. Some wrapper commands have been written to make this easier, but users are free to look at Conda Documentation themselves.

Initialising a new project

A Conda environment can be setup by opening a Jupyter notebook/lab and from the terminal running the following command.

env-control add new-environment

This will trigger the creation of a Conda environment as well as adding Jupyter Kernels for both R & Python by default which are persisted on the data volume. When running this for a brand new environment this is likely to take ~10 minutes as it installs a number of dependencies, however this will rarely be required.

Once the command is complete, refresh the page and from the Launcher, two new kernels will be visible which correspond to the newly created Conda environment. There is a corresponding command;

env-control remove environment-name

This will remove a Conda environment called environment-name, and is useful in clearing down environments which are no longer required.