Introduction to Sparklens

Sparklens is an open source Spark profiling tool from Qubole, which can be used with any Spark application. Sparklens helps in tuning spark applications by identifying the potential opportunities for optimizations with respect to driver side computations, lack of parallelism, skew, etc. The built-in scheduler simulator can predict how a given spark application will run on any number of executors in a single run.

Sparklens analyzes the given Spark application in a single run, and provides the following information:

  • If the application can run faster with more cores and how to optimize it.

  • If the compute cost can be saved by running the application with less cores and without much increase in wall clock time.

  • The absolute minimum time that the application can take if infinite executors are given.

  • How to run the application below the absolute minimum time.

Using Sparklens

You can analyze your Spark applications with Sparklens by adding extra command line option to spark-submit or spark-shell.

--packages qubole:sparklens:0.3.1-s_2.11
--conf spark.extraListeners
=com.qubole.sparklens.QuboleJobListener

Starting with Spark 2.4.0 version, you can analyze your Spark applications with Sparklens without passing the --packages option externally. This feature is not enabled for all users by default. Create a ticket with Qubole Support to enable this feature on the QDS account.

After the feature is enabled, you should pass the --conf spark.extraListeners=com.qubole.sparklens.QuboleJobListener command line option to run the sparklens reporting.

The open source code is available at https://github.com/qubole/sparklens.

For more information about Sparklens, see the Sparklens blog.