Configuring Multi-instance HiveServer2

Qubole provides an additional option to run HiveServer2 (HS2) with a Hadoop 2 (Hive) cluster, which is called multi-instance HiveServer2. It is recommended to configure and use this option HS2 when you expect high concurrency of workloads or peak traffic volume is much larger than the average workloads.

When multi-instance HS2 option is selected, Qubole manages HS2 JVM life cycles automatically and it is transparent to the end users running workloads on the Hadoop 2 (Hive) Cluster. Multi-instance HS2 is only supported in Hive versions 2.1.1, 2.3, and 3.1.1 (beta).

This enhancement is available for beta access and it is not available by default. Create a ticket with Qubole Support to enable it on the QDS account.

Qubole has added support for load-aware autoscaling and agent-based adaptive load balancing in HS2 clusters. Create a ticket with Qubole Support to enable it on the QDS account.

These sections help you understand the lifecycle and configuration of multi-instance HS2:

Configuring Multi-instance HS2

You can configure a multi-instance HS2 through the UI and the REST API as well.

This topic describes how to configure a multi-instance HS2 through the Clusters UI. For details on configuring multi-instance HS2 through REST API, see Choosing Multi-instance as an option for running HiveServer2 on Hadoop (Hive) Clusters.

Perform these steps to configure multi-instance HS2:

  1. Navigate to the Qubole UI > Clusters.

  2. Go to the Hadoop 2 (Hive) cluster on which you want to configure a multi-instance HS2.

  3. Click the Advanced Configuration tab. Under HIVE SETTINGS, pull the drop-down list. It displays two other options besides the Disabled (default) option as illustrated here.

  4. Choose Enable as an additional cluster and you can see the Edit button as illustrated here.

  5. Click Edit to see the HS2 Settings tab for multi-instance HS2 as illustrated here.

  6. Select the Worker Node Type if you want a different worker node type for multi-instance HS2 than the default node type.

    The master node of a multi-instance HS2 is always m3.xlarge. However, it is not visible in the UI by default. Create a ticket with Qubole Support to configure the master node in the HS2 Settings.

  7. Change the number of required nodes if you want to have more than two nodes. You can also increase the number of nodes even when the cluster is running. However, you cannot reduce the number of nodes or remove them when the cluster is running.

  8. You can specify a different node bootstrap file location if you want to change its default location inherited from the associated Hadoop 2 (Hive) cluster.

  9. Enter the Elastic IP of Master Node for multi-instance HS2. When you want to directly (through external Business Intelligence (BI) tools) run queries on multi-instance HS2, you can attach an Elastic IP (EIP) to the multi-instance HS2 and configure the tools to connect to the EIP of the multi-instance HS2’s master node. You must add EIP to the HS2 master node because HS2 queries run on the multi-instance HS2 instead of the associated Hadoop 2 (Hive) cluster.

  10. Add custom EC2 tags and values and ensure that you do not add reserved keywords as EC2 tags as described in Advanced configuration: Modifying EC2 Settings (AWS).

  11. Click Save to save HS2 Settings.

Lifecycle of Multi-instance HS2

The lifecycle of multi-instance HS2 is intrinsically associated with the Hadoop 2 (Hive) cluster. The multi-instance HS2 starts and stops with the associated Hadoop 2 (Hive) cluster.

Ideally, you must not terminate the multi-instance HS2 without stopping the associated Hadoop 2 (Hive) cluster. Qubole provides the option to only terminate the multi-instance HS2 as a safeguard against any possible bugs and runaway clusters.