Introduction to Qubole Clusters

Qubole Data Service (QDS) provides a unified platform for managing different types of compute clusters.

QDS can run queries and programs written with tools such as SQL, MapReduce, Cascading, Pig, Scala, and Python. These run on distributed execution frameworks such as Hadoop, Presto, and Spark, on multi-node clusters comprising one coordinator node and one or more worker nodes.

Note

Not all of these tools and engines are available on all Cloud platforms; see QDS Components: Supported Versions and Cloud Platforms.

Cluster Basics

Each QDS account has pre-configured clusters of different Types (Hadoop, Spark, etc.) You can configure additional clusters. Each cluster can have one or more unique Cluster Labels.

A new account is pre-configured with one cluster of each of the following types:

  1. Spark (labelled as spark)
  2. Hadoop 2 (labelled as hadoop2)
  3. Presto (labelled as presto; currently AWS and Azure only)

Navigate to Control Panel > Clusters in The QDS UI to see the list of clusters.

Note

The clusters are configured but are not active. A red status icon indicates that a cluster is down.

You can configure several clusters of a single cluster type as needed. (Trial accounts are limited to four clusters.)

Cluster Life Cycle Management

See Understanding the QDS Cluster Lifecycle.

Cluster Labels and Command Routing

You must assign at least one unique label to each QDS cluster; you can assign more than one label. Each new QDS account has a default Hadoop cluster with the label default.

Qubole commands are routed to clusters using these rules:

  • If a command includes a cluster label, the command is routed to the cluster with the corresponding label.

  • If no cluster label is included, the command is routed to the first matching cluster; for example:

    • Hive, Pig, and Hadoop commands are routed to the first matching Hadoop cluster.
    • Presto commands are routed to the first matching Presto cluster.
    • Spark commands are routed to the first matching Spark cluster.

    Note

    Pig and Presto are not supported on all Cloud platforms; see QDS Components: Supported Versions and Cloud Platforms.

Qubole Cluster EC2 Tags (AWS)

For AWS clusters, Qubole instances are tagged with the following three EC2 tags:

  • Qubole - This tag’s value is a unique identifier based on the account and cluster. Its value is qbol-acc<AccountID>_cl<ClusterID>.
  • alias - This tag identifies the node within the Hadoop cluster. Its value is master or node<number>.
  • Name - This tag also identifies the node. Its value is the same as the alias tag. This value can be overridden using a custom EC2 tag.

For information on custom EC2 tags, see Advanced configuration: Modifying EC2 Settings (AWS) (UI) and hadoop_settings (API).