Understanding Different Ways to Run Hive

On QDS, you can run Hive in three different ways that are as follows:

QDS supports Hive 1.2 and 2.1 version on all the above ways of running Hive. For more information on versions, see Understanding Hive Versions.

Pros and Cons of Each Method to Run Hive provides a table that lists pros, cons, and recommended scenario of each method.

Running Hive through QDS Servers

Here is the architecture that depicts how Hive runs through QDS Servers. Pros and Cons of Each Method to Run Hive provides a table that lists pros, cons, and recommended scenario of this method.

../../_images/HiveviaQDS-Server.png

Running Hive on the Master Node

Here is the architecture that depicts how Hive runs on the cluster’s master node. Pros and Cons of Each Method to Run Hive provides a table that lists pros, cons, and recommended scenario of this method.

../../_images/HiveonMaster.png

Running Hive with HiveServer2 on the Master Node

Here is the architecture that depicts how Hive runs on HiveServer2 (HS2). Pros and Cons of Each Method to Run Hive provides a table that lists pros, cons, and recommended scenario of this method.

../../_images/HS2-Standalone.png

Running Hive with Multi-instance HiveServer2

Here is the architecture that depicts how Hive runs on multi-instance HiveServer2 (HS2). Pros and Cons of Each Method to Run Hive provides a table that lists pros, cons, and recommended scenario of this method.

../../_images/QuboleHiveMulti-InstanceHS2.png

Pros and Cons of Each Method to Run Hive

This table describes the pros, cons, and recommendation of each method of running Hive.

Method Pros Cons Recommended Scenario
Running Hive through QDS Servers
  • It is scalable as the Hadoop 2 (Hive) cluster autoscales based on the number of queries.
  • In case if the custom Hive metastore is in a different AWS region, the latency is high.
This method is recommended when you:
  • Are a beginner
  • Handle a lower query traffic
  • Use the Qubole Hive Meta Store
Running Hive on the Master Node
  • It secures the data and reduces the latency in case if you use a custom Hive metastore.
  • It requires a large-sized master node.
  • The master node is not scalable.

This method is recommended when you:

  • Handle a low-medium query traffic
  • Can afford a High-memory EC2 instance type
  • Use the custom metastore in a different AWS region
Running Hive with HiveServer2 on the Master Node
  • It secures the data and reduces the latency in case if you use a custom Hive metastore.
  • It requires a suitable HS2 memory configuration.
  • It can have a single point of failure when there is a higher workload and in that case, it is not scalable.

This method is recommended when you:

  • Handle a medium-high query traffic
  • Use the custom metastore in a different AWS region
  • Want to use other HS2 features such as metadata caching
Running Hive with multi-instance HiveServer2
  • It secures the data and reduces the latency in case if you use a custom Hive metastore. It is more reliable and provides more scalable workload handling.
  • It is an additional cost to maintain HS2 cluster.

This method is recommended when you:

  • Handle a high query traffic and higher concurrency
  • Prefer scalability and high availability at a higher cost of maintaining a HS2 cluster
  • Want to use it in a large enterprise
  • Use the custom metastore in a different AWS region
  • Use HS2 features such as metadata caching