Introduction

Apache Spark is a fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation. Spark’s in-memory data model and fast processing makes it particularly suitable for applications such as:

  • Machine Learning and Graph Processing
  • Stream Processing
  • Interactive queries against In-Memory data

Qubole offers only the Spark-on-YARN variant. Hence, the Apache Hadoop YARN parameters that Qubole offers also apply to Spark. For more information on the YARN parameters, see Significant Parameters in YARN and YARN in Qubole.

For supported Spark versions, see QDS Components: Supported Versions and Cloud Platforms and Spark Version Support.