Understanding Different Ways to Run Hive

On QDS, you can run Hive in three different ways that are as follows:

QDS supports Hive version 2.1.1 on all the above ways of running Hive. For more information on versions, see Understanding Hive Versions.

Pros and Cons of Each Method to Run Hive provides a table that lists pros, cons, and recommended scenario of each method.

Running Hive through QDS Servers

Here is the architecture that depicts how Hive runs through QDS Servers. Pros and Cons of Each Method to Run Hive provides a table that lists pros, cons, and recommended scenario of this method.

../../../_images/HiveviaQDS-Server.png

Running Hive on the Coordinator Node

Here is the architecture that depicts how Hive runs on the cluster’s Coordinator node. Pros and Cons of Each Method to Run Hive provides a table that lists pros, cons, and recommended scenario of this method.

../../../_images/HiveonMaster.png

Qubole allows you to write to local filesystem and add custom Jars/UDFs by default when the query runs on the HiveServer2 or Hive-on-Coordinator mode.

Running Hive with HiveServer2 on the Coordinator Node

Here is the architecture that depicts how Hive runs on HiveServer2 (HS2). Pros and Cons of Each Method to Run Hive provides a table that lists pros, cons, and recommended scenario of this method.

../../../_images/HS2-Standalone.png

Qubole allows you to write to local filesystem and add custom Jars/UDFs by default when the query runs on the HiveServer2 or Hive-on-Coordinator mode.

Running Hive with Multi-instance HiveServer2

Here is the architecture that depicts how Hive runs on multi-instance HiveServer2 (HS2). Pros and Cons of Each Method to Run Hive provides a table that lists pros, cons, and recommended scenario of this method.

../../../_images/QuboleHiveMulti-InstanceHS2.png

Pros and Cons of Each Method to Run Hive

This table describes the pros, cons, and recommendation of each method of running Hive.

Method

Pros

Cons

Recommended Scenario

Running Hive through QDS Servers

  • It is scalable as the Hadoop 2 (Hive) cluster autoscales based on the number of queries.

  • In case if the custom Hive metastore is in a different AWS region, the latency is high.

This method is recommended when you:

  • Are a beginner

  • Handle a lower query traffic

  • Use the Qubole Hive Meta Store

Running Hive on the Coordinator Node

  • It secures the data and reduces the latency in case if you use a custom Hive metastore.

  • It requires a large-sized coordinator node.

  • The coordinator node is not scalable.

This method is recommended when you:

  • Handle a low-medium query traffic

  • Can afford a High-memory EC2 instance type

  • Use the custom metastore in a different AWS region

Running Hive with HiveServer2 on the Coordinator Node

  • It secures the data and reduces the latency in case if you use a custom Hive metastore.

  • It requires a suitable HS2 memory configuration.

  • It can have a single point of failure when there is a higher workload and in that case, it is not scalable.

This method is recommended when you:

  • Handle a medium-high query traffic

  • Use the custom metastore in a different AWS region

  • Want to use other HS2 features such as metadata caching

Running Hive with multi-instance HiveServer2

  • It secures the data and reduces the latency in case if you use a custom Hive metastore. It is more reliable and provides more scalable workload handling.

  • It is an additional cost to maintain HS2 cluster.

This method is recommended when you:

  • Handle a high query traffic and higher concurrency

  • Prefer scalability and high availability at a higher cost of maintaining a HS2 cluster

  • Want to use it in a large enterprise

  • Use the custom metastore in a different AWS region

  • Use HS2 features such as metadata caching