Introduction to Qubole Clusters¶
Qubole Data Service (QDS) provides a unified platform for managing different types of compute clusters.
QDS can run queries and programs written with tools such as SQL, MapReduce, Cascading, Pig, Scala, and Python. These run on distributed execution frameworks such as Hadoop, Presto, and Spark, on multi-node clusters comprising one master node and one or more worker nodes.
Not all of these tools and engines are available on all Cloud platforms; see QDS Components: Supported Versions and Cloud Platforms.
A new account is pre-configured with one cluster of each of the following types:
- Spark (labelled as spark)
- Hadoop 2 (labelled as hadoop2)
- Presto (labelled as presto; currently AWS and Azure only)
Navigate to Control Panel > Clusters in The QDS UI to see the list of clusters.
The clusters are configured but are not active. A red status icon indicates that a cluster is down.
You can configure several clusters of a single cluster type as needed. (Trial accounts are limited to four clusters.)
Cluster Labels and Command Routing¶
You must assign at least one unique label to each QDS cluster; you can assign more than one label. Each new QDS account has a default Hadoop cluster with the label default.
Qubole commands are routed to clusters using these rules:
If a command includes a cluster label, the command is routed to the cluster with the corresponding label.
If no cluster label is included, the command is routed to the first matching cluster; for example:
- Hive, Pig, and Hadoop commands are routed to the first matching Hadoop cluster.
- Presto commands are routed to the first matching Presto cluster.
- Spark commands are routed to the first matching Spark cluster.
Pig and Presto are not supported on all Cloud platforms; see QDS Components: Supported Versions and Cloud Platforms.