Hive

Hive 2.1 is Generally Available

Hive 2.1 version is now generally available. Cluster Restart Required

HA Proxy in the Cluster Coordinator Node

INFRA-1016 and INFRA-603: Qubole has added HA Proxy in the cluster coordinator node for balancing the load when there are multiple connections between the cluster and the Qubole-managed Hive Metastore. This removes a single point of failure and provides more stability. Via Support

If you are using the bastion node, you must allow ports, from 20001 to 20005 in addition to the port 7000 for the incoming TCP traffic from the cluster’s coordinator node.

Enhancements

  • QHIVE-1633: The logs for Hive Tez jobs display the split computation, progress state, and the completion state.
  • QHIVE-3727: The default value of hive.metastore.drop.partitions.batch.size has been set to 1000 to drop the partitions in 1000 batches. You can configure this parameter based on the number of partitions that you want to drop.
  • EAM-1334: Qubole supports custom Hive metastore to access it through QDS.

Bug Fixes

  • QHIVE-3508: It is recommended to configure hive.groupby.skewindata.use.rand.with.seed if you are configuring hive.groupby.skewindata to avoid data inconsistency when map tasks are reattempted.
  • QHIVE-3582: The issue in which pruning columns resulted in incorrect sequence/order of columns in the SelectOperator, has been resolved.
  • QHIVE-3641: The container memory for reduce tasks is decided based on the mapreduce.reduce.memory.mb configuration instead of the mapreduce.map.memory.mb when Tez is the execution engine. It resolves the issue in which overriding the default map/reduce tasks at the Hive query level was unsuccessful.