Creating a Cluster
There are two options for creating a Dask cluster from within JupyterLab that are listed below.
NOTE: Since the resources used are shared among all users of the platform, when finished with the current activity Dask Workers should be should be shut down.
JupyterLab Dask Extension
The recommended way to create a cluster is to use the JupyterLab interface to
create a new cluster. This is done simply by selecting the Dask icon in the
left hand sidebar and selecting +NEW
. This may take a few seconds to
initialize and should result in a cluster appearing in the Clusters
screen
which is titled something like KubeCluster 1
.
At this point the Dask scheduler will be setup but no workers will be present,
these are added by selecting SCALE
and choosing a worker count. Once the
worker count is selected it may take up to a minute for all of the workers to
register correctly, at this point (with a notebook open), selecting the <>
icon will generate and insert python code to create a client to access the Dask
cluster.
When finished with current work, the workers can be deleted by selecting
SHUTDOWN
.
Manual Setup (Advanced)
Instead of using the Dask JupyerLab extension which uses a number of standard defaults to create a Dask cluster, the same operation can be done manually. Configuaration for this is slightly more complex but allows for a greater degree of control, including choice of a default docker image that the Dask workers use as well as setting custom environmental variables.
The documentation for this can be found here.
One thing to note within DataLabs is that when creating a custom worker-spec.yml
as per the examples, an additional field must be added to ensure that workers are
provisioned correctly within the cluster, this is namespace: ${PROJECT_KEY}-compute
where PROJECT_KEY is the name of the current DataLabs
project. An example of spinning up a Dask cluster manually can be found below.
from dask_kubernetes import KubeCluster
cluster = KubeCluster.from_yaml('worker-spec.yml')
cluster.scale_up(1)
cluster.adapt(minimum=1, maximum=3)
Where the worker-spec.yml
is as follows.
kind: Pod
metadata:
# This line must be added in order for workers to be created
namespace: testproject-compute
spec:
restartPolicy: Never
containers:
- image: daskdev/dask:latest
imagePullPolicy: IfNotPresent
args:
- dask-worker
- --nthreads
- '2'
- --no-bokeh
- --memory-limit
- 4GB
- --death-timeout
- '60'
name: dask
env:
- name: EXTRA_PIP_PACKAGES
value: xarray zarr
resources:
limits:
cpu: "2"
memory: 4G
requests:
cpu: "2"
memory: 4G