Configuring Spark Settings for Jupyter Notebooks
By default, the cluster-wide spark configurations are used for Jupyter notebooks.
You can specify the required Spark settings to configure the Spark application for a Jupyter notebook by using the %%configure
magic.
Note
You can configure Spark settings only for Jupyter notebooks with Spark kernels.
You should specify the required configuration at the beginning of the notebook, before you run your first spark bound code cell.
If you want to specify the required configuration after running a Spark bound command, then you should use the -f
option with the %%configure
magic.
If you use the -f
option, then all the progress made in the previous Spark jobs is lost.
The following sample codes show how to specify Spark configurations.
%%configure -f
{"executorMemory": "3072M", "executorCores": 4, "numExecutors":10}
%%configure -f
{ "driverMemory" : "20G", "conf" : { "spark.sql.files.ignoreMissingFiles": "true",
"spark.jars.packages": "graphframes:graphframes:0.7.0-spark2.4-s_2.11"}}
Note
The Spark drivers are created on the cluster worker nodes by default for better distribution of load and better usage of cluster resources. If you want to execute the Spark driver on the coordinator node, contact Qubole Support.
The following table lists the Spark configuration parameters with their values.
Parameters |
Description |
Values |
---|---|---|
jars |
Jars to be used in the session |
List of string |
pyFiles |
Python files to be used in the session |
List of string |
files |
Files to be used in the session |
List of string |
driverMemory |
Amount of memory to be used for the driver process |
string |
driverCores |
Number of cores to be used for the driver process |
int |
executorMemory |
Amount of memory to be used for the executor process |
string |
executorCores |
Number of cores to be used for the executor process |
int |
numExecutors |
Number of executors to be launched for the session |
int |
archives |
Archives to be used in the session |
List of string |
queue |
Name of the YARN queue |
string |
name |
Name of the session (name must be in lower case) |
string |
conf |
Spark configuration properties Note You can specify all other Spark configurations. |
Map of key=val |