Enabling Hive on the Cluster Coordinator

You must enable Hive on the Cluster Coordinator on supported versions of Hive that are Hive versions 1.2 (deprecated), 2.1.1, and 2.3. By default, Hive 1.2 (deprecated) queries run through QDS servers.

Note

The feature to run Hive 2.x queries on QDS servers by default is not available on all QDS accounts. Create a ticket with Qubole Support to get it enabled on the QDS account.

Caution

Enabling Hive on the Cluster Coordinator is not recommended to be added as an Hadoop override.

This page covers the following:

Considerations for AWS

To reduce latency in data access, you might want to run Hive on the Coordinator node of the cluster. This enables Hive to interact with the underlying data source directly, rather than going through the Qubole-managed layers. As a part of each Hive query, you can specify whether you want QDS to run the query in its own servers, or offload the processing directly to the cluster Coordinator. Enabling Hive on the Coordinator is also useful in reducing latency in a multi-region environment (with or without an AWS VPC) and in configurations using a custom Hive metastore.

Advantages of Enabling Hive on the Cluster Coordinator

  • Running Hive on the cluster Coordinator helps when the cluster is in an AWS VPC and there is data residing in the VPC that cannot be accessed outside the VPC. The solution here is to run Hive in the same VPC as the data; the cluster Coordinator is an appropriate choice for this.

  • Running Hive on the cluster Coordinator helps reduce latency when the Hive metastore is in a different region from the data source where Hive is running.

Disadvantages of Enabling Hive on the Cluster Coordinator

  • Running Hive on the cluster Coordinator is not scalable.

  • Running Hive on the Coordinator node can overload this node.

Recommendations for Azure and Oracle OCI

See Connecting to a Custom Hive Metastore (Azure and Oracle OCI).

How to Enable Hive-on-Coordinator

Hive-on-Coordinator is not the default for Hive 2.x, as the queries can run through QDS servers by default.

Note

The feature to run Hive 2.x queries on QDS servers by default is not available on all QDS accounts. Create a ticket with Qubole Support to get it enabled on the QDS account.

For Hive version 1.2 (deprecated), you must enable it explicitly; otherwise Hive queries run through QDS servers. Proceed as follows.

Enabling Hive-on-Coordinator for each Query

Submit each query with this setting:

set hive.on.master=true

Caution

Setting this property for enabling Hive on the Cluster Coordinator is not recommended to be added as an Hadoop override.

This redirects the query to the cluster Coordinator. (The setting is removed from the query before it’s submitted to Hive.)

Enabling Hive-on-Coordinator Account-Wide

To enable Hive-on-Coordinator across a QDS account, create a ticket with Qubole Support.

Once Hive-on-Coordinator is enabled for an account, you don’t need to add set hive.on.master=true to each query.

Note

This setting cannot be added in the Hive bootstrap file. Currently, this feature is supported on Hadoop (Hive) clusters. It is not recommended to be added as an Hadoop override.