1. What is a Spark cluster?

Spark is a unified analytics engine for large-scale data processing which can be used in both Python and R.

A Spark cluster consists of:

A scheduler: this is responsible for deciding how to perform your calculation. It subdivides the work into chunks and co-ordinates how those chunks are performed across a number of workers.
A number of workers. Workers perform the chunks of calculation that they have been allocated.

In your lab notebook, you will start a Spark context or session. This is what lets your notebook talk to the scheduler of the Spark cluster, telling the scheduler what calculation you want to perform.

DataLabs Documentation

1. What is a Spark cluster?