Enabling Hive on the Cluster Master

You must enable Hive on the Cluster Master on all versions of Hive that is version 1.2.0 and 2.1.1. By default, Hive 1.2.0 queries run through QDS servers.

Note

The feature to run Hive 2.x queries on QDS servers by default is not available on all QDS accounts. Create a ticket with Qubole Support to get it enabled on the QDS account.

Caution

Enabling Hive on the Cluster Master is not recommended to be added as an Hadoop override.

This page covers the following:

Considerations for AWS

To reduce latency in data access, you might want to run Hive on the Master node of the cluster. This enables Hive to interact with the underlying data source directly, rather than going through the Qubole-managed layers. As a part of each Hive query, you can specify whether you want QDS to run the query in its own servers, or offload the processing directly to the cluster Master. Enabling Hive on the Master is also useful in reducing latency in a multi-region environment (with or without an AWS VPC) and in configurations using a custom Hive metastore.

Advantages of Enabling Hive on the Cluster Master

  • Running Hive on the cluster Master helps when the cluster is in an AWS VPC and there is data residing in the VPC that cannot be accessed outside the VPC. The solution here is to run Hive in the same VPC as the data; the cluster Master is an appropriate choice for this.
  • Running Hive on the cluster Master helps reduce latency when the Hive metastore is in a different region from the data source where Hive is running.

Disadvantages of Enabling Hive on the Cluster Master

  • Running Hive on the cluster Master is not scalable.
  • Running Hive on the Master node can overload this node.

Recommendations for Azure and Oracle OCI

See Connecting to a Custom Hive Metastore (Azure and Oracle OCI).

How to Enable Hive-on-Master

Hive-on-Master is not the default for Hive 2.x, as the queries can run through QDS servers by default.

Note

The feature to run Hive 2.x queries on QDS servers by default is not available on all QDS accounts. Create a ticket with Qubole Support to get it enabled on the QDS account.

For Hive 1.2.0 and earlier versions, you must enable it explicitly; otherwise Hive queries run through QDS servers. Proceed as follows.

Enabling Hive-on-Master for each Query

Submit each query with this setting:

set hive.on.master=true

Caution

Setting this property for enabling Hive on the Cluster Master is not recommended to be added as an Hadoop override.

This redirects the query to the cluster Master. (The setting is removed from the query before it’s submitted to Hive.)

Enabling Hive-on-Master Account-Wide

To enable Hive-on-Master across a QDS account, create a ticket with Qubole Support.

Once Hive-on-Master is enabled for an account, you don’t need to add set hive.on.master=true to each query.

Note

This setting cannot be added in the Hive bootstrap file. Currently, this feature is supported on Hadoop 1 and Hadoop 2 (Hive) clusters. It is not recommended to be added as an Hadoop override.