Qubole runs applications written in MapReduce, Cascading, Pig, Hive, Scalding, and Spark using Apache Hadoop. Qubole offers two flavors of Hadoop, based on Apache releases commonly referred to as Hadoop 2.

These implementations of Hadoop are compatible with open source APIs and are largely covered by the Apache documentation. Qubole has added optimizations, as well as important capabilities such as autoscaling.

The sections that follow cover Qubole optimizations, and aspects of Hadoop 2 (Hadoop 2.6.x) that are especially important in Qubole clusters.