Qubole runs applications written in MapReduce, Cascading, Pig, Hive, Scalding, and Spark using Apache Hadoop. Qubole offers two flavors of Hadoop, based on Apache releases commonly referred to as Hadoop 1 and 2 respectively. (Hadoop 1 is currently supported on AWS clusters only.)

These implementations of Hadoop are compatible with open source APIs and are largely covered by the Apache documentation. Qubole has added optimizations, as well as important capabilities such as autoscaling.

The sections that follow cover optimizations, and aspects of Hadoop that are especially important in Qubole clusters.