Configuring Spark Settings for Jupyter Notebooks

By default, the cluster-wide spark configurations are used for Jupyter notebooks. You can specify the required Spark settings to configure the Spark application for a Jupyter notebook by using the %%configure magic.

Note

You can configure Spark settings only for Jupyter notebooks with Spark kernels.

You should specify the required configuration at the beginning of the notebook, before you run your first spark bound code cell.

If you want to specify the required configuration after running a Spark bound command, then you should use the -f option with the %%configure magic. If you use the -f option, then all the progress made in the previous Spark jobs is lost.

The following sample codes show how to specify Spark configurations.

%%configure -f
{"executorMemory": "3072M", "executorCores": 4, "numExecutors":10}
%%configure -f
{ "driverMemory" : "20G", "conf" : { "spark.sql.files.ignoreMissingFiles": "true",
"spark.jars.packages": "graphframes:graphframes:0.7.0-spark2.4-s_2.11"}}

Note

The Spark drivers are created on the cluster worker nodes by default for better distribution of load and better usage of cluster resources. If you want to execute the Spark driver on the coordinator node, contact Qubole Support.

The following table lists the Spark configuration parameters with their values.

Parameters

Description

Values

jars

Jars to be used in the session

List of string

pyFiles

Python files to be used in the session

List of string

files

Files to be used in the session

List of string

driverMemory

Amount of memory to be used for the driver process

string

driverCores

Number of cores to be used for the driver process

int

executorMemory

Amount of memory to be used for the executor process

string

executorCores

Number of cores to be used for the executor process

int

numExecutors

Number of executors to be launched for the session

int

archives

Archives to be used in the session

List of string

queue

Name of the YARN queue

string

name

Name of the session (name must be in lower case)

string

conf

Spark configuration properties


Note

You can specify all other Spark configurations.

Map of key=val