Introduction to Sparklens

Sparklens is an open source Spark profiling tool from Qubole, which can be used with any Spark application. Sparklens helps in tuning spark applications by identifying the potential opportunities for optimizations with respect to driver side computations, lack of parallelism, skew, etc. The built-in scheduler simulator can predict how a given spark application will run on any number of executors in a single run.

Sparklens analyzes the given Spark application in a single run, and provides the following information:

  • If the application can run faster with more cores and how to optimize it.
  • If the compute cost can be saved by running the application with less cores and without much increase in wall clock time.
  • The absolute minimum time that the application can take if infinite executors are given.
  • How to run the application below the absolute minimum time.

Using Sparklens

You can analyze your Spark applications with Sparklens by adding extra command line option to spark-submit or spark-shell.

--packages qubole:sparklens:0.1.2-s_2.11
--conf spark.extraListeners
=com.qubole.sparklens.QuboleJobListener

The open source code is available at https://github.com/qubole/sparklens.

For more information about Sparklens, see the Sparklens blog.