Hive
The new features and enhancements are:
Other enhancements and bug fixes are listed in:
Automatic Statistics Collections on Hive Queries
QHIVE-4562: Qubole has added the following enhancements to Automatic Statistics Collection from Hive queries:
The Automatic Statistics Collection feature is now supported in Hive 3.1.1 (beta).
The automated statistics query now runs on the Hive-on-coordinator mode on the maintenance cluster.
Qubole has added support for filtering tables considered for refreshing the statistics using wildcard patterns.
The user’s Hive bootstrap is now executed before running Automatic Statistics queries.
QHIVE-4839: Hive statistics auto-gather for basic statistics and column statistics are available but only Qubole Support can enable this
enhancement. It collects statistics on INSERT
and INSERT OVERWRITE
queries. Via Support | Cluster Restart Required
HiveServer2 Enhancements
These are the enhancements:
QHIVE-3996: The HiveServer2 query execution latency is reduced by 1-1.5 seconds.
QHIVE-4786: You can now configure HS2 clusters to use private IP for communication between the coordinator node and worker nodes.
To enable:
At cluster level, set
hive.hs2.cluster.use.private.ip=true
in cluster’s Hive overrides. Disabled | Cluster Restart RequiredAt account level for all clusters, contact Qubole Support. Via Support | Cluster Restart Required
Multi-instance HiveServer2 Enhancements
These are the enhancements:
QHIVE-3257: Qubole has added support for load-aware autoscaling in the HS2 cluster. Via Support | Cluster Restart Required
QHIVE-4582: Qubole has added support for agent-based adaptive load balancing. Via Support | Cluster Restart Required
Additional metrics, alerts, and dashboards are now available through Datadog.
Hive ACID Enhancements
Qubole has added enhancements to the Hive ACID feature, which are:
QHIVE-4707: Hive Streaming API is now supported with Hive 3.1.1 (beta). There is a limitation in the case of blob stores: there must be one transaction per batch size because blob stores do not support partial writes.
Note
Blob stores in this context should not be confused with Microsoft Azure Blob Storage. For information about Hive and blob storage, see the Apache Hive documentation.
QHIVE-4840: Qubole has introduced an enhancement that allows users to delay the obsolete data cleanup after compaction. To use this enhancement, set
hive.compactor.delayed.cleanup.enabled=true
. You can also configure a delay in the cleanup using theCLEANER_RETENTION_TIME_SECONDS
table property. Disabled | Cluster Restart Required
Faster Downscaling
QHIVE-4740: Qubole supports custom Tez shuffle handler in Hive 3.1.1 (beta), which can speed up the worker nodes’ downscaling process in a Hadoop (Hive) cluster. Via Support | Cluster Restart Required
For more information, see the documentation.
Enhancements
QHIVE-2601: From Hive 3.1.1 (beta) onwards, Qubole supports merging small files at the end of MapReduce jobs and Tez DAGs.
QHIVE-4683: Qubole has added the path validation to the
ALTER TABLE RECOVER PARTITIONS
command. For more information, see the documentation.QHIVE-4829: Qubole has added support for the Surrogate Keys function in Hive 3.1.1 (beta). For more details, see HIVE-20536.
QHIVE-4834: Qubole has backported open-source fix for the vectorized limit operator returning the incorrect number of results with offset. Related open-source jira: HIVE-22164.
QHIVE-4856: In Tez, you can use
hive-exec jar
that is locally available on cluster nodes. This reduces the overhead of localization. It increases the efficiency by avoiding additional HDFS operations.QHIVE-4966: To reduce AWS read API calls in Hive 3.1.1 (beta), Qubole has changed default values of the following configuration properties:
mapred.min.split.size=256MB
mapred.max.split.size=256MB
QHIVE-4873: Qubole has backported open-source fixes to avoid the issue where Hive queries with JOIN condition with
date
/timestamp
/INTERVAL
fail withSemanticException
.Related open-source Hive jira issues:
QHIVE-5020: Qubole provides an option to disable running Hive commands on a Presto cluster. Via Support | Cluster Restart Required
QTEZ-473: Qubole has optimized Tez-0.9.1 UI to run faster (OSS TEZ-4085).
Bug Fixes
QHIVE-4807: Fixed an error case in MapJoin conversion when no table is selected as a big table (OSS HIVE-22201).
QHIVE-4839: Fixed an issue with Hive statistics auto-gather feature that occurred during a multi
INSERT
Hive query.QHIVE-4849: Qubole has changed timezone in the Tez UI to UTC and the time format to
D days, H hours
. This eliminates differences in the time-format and the timezone between ResourceManager and Tez.QHIVE-4885: The ORC filename is printed along with the error when there is
InvalidProtocolBufferException
while readingPostScript
of an ORC file to help you to inspect and ensure that file has a validPostScript
.QHIVE-4925: Qubole has upgraded the
commons-lang3
version to 3.4. This fixes an issue which caused Hive queries to fail with thejava.lang.NoSuchMethodError: org.apache.commons.lang3.StringUtils.isNoneEmpty
error.QHIVE-4978: Fixed the issue when the number of Auto Statistics commands running was more than the limit set for the account.
QHIVE-4996: Fixed the issue when the Auto Statistics command was not triggered in
INSERT
andINSERT OVERWRITE
queries.QHIVE-5010: Fixed the issue when the Auto Statistics command did not get triggered if two accounts contain the same cluster tag for the maintenance cluster.
For a list of bug fixes between versions R57 and R58, see Changelog for api.qubole.com.