The new features and enhancements are:
- Automatic Statistics Collections on Hive Queries
- HiveServer2 Enhancements
- Multi-instance HiveServer2 Enhancements
- Hive ACID Enhancements
- Faster Downscaling
Other enhancements and bug fixes are listed in:
Automatic Statistics Collections on Hive Queries¶
QHIVE-4562: Qubole has added the following enhancements to Automatic Statistics Collection from Hive queries:
- The Automatic Statistics Collection feature is now supported in Hive 3.1.1 (beta).
- The automated statistics query now runs on the Hive-on-master mode on the maintenance cluster.
- Qubole has added support for filtering tables considered for refreshing the statistics using wildcard patterns.
- The user’s Hive bootstrap is now executed before running Automatic Statistics queries.
QHIVE-4839: Hive statistics auto-gather for basic statistics and column statistics are available but only Qubole Support can enable this
enhancement. It collects statistics on
INSERT OVERWRITE queries. Via Support | Cluster Restart Required
These are the enhancements:
QHIVE-3996: The HiveServer2 query execution latency is reduced by 1-1.5 seconds.
QHIVE-4786: You can now configure HS2 clusters to use private IP for communication between the master node and worker nodes.
Multi-instance HiveServer2 Enhancements¶
These are the enhancements:
- QHIVE-3257: Qubole has added support for load-aware autoscaling in the HS2 cluster. Via Support | Cluster Restart Required
- QHIVE-4582: Qubole has added support for agent-based adaptive load balancing. Via Support | Cluster Restart Required
- Additional metrics, alerts, and dashboards are now available through Datadog.
Hive ACID Enhancements¶
Qubole has added enhancements to the Hive ACID feature, which are:
QHIVE-4707: Hive Streaming API is now supported with Hive 3.1.1 (beta). There is a limitation in the case of blob stores: there must be one transaction per batch size because blob stores do not support partial writes.
QHIVE-4840: Qubole has introduced an enhancement that allows users to delay the obsolete data cleanup after compaction. To use this enhancement, set
hive.compactor.delayed.cleanup.enabled=true. You can also configure a delay in the cleanup using the
CLEANER_RETENTION_TIME_SECONDStable property. Disabled | Cluster Restart Required
QHIVE-4740: Qubole supports custom Tez shuffle handler in Hive 3.1.1 (beta), which can speed up the worker nodes’ downscaling process in a Hadoop (Hive) cluster. Via Support | Cluster Restart Required
For more information, see the documentation.
QHIVE-4683: Qubole has added the path validation to the
ALTER TABLE RECOVER PARTITIONScommand. For more information, see the documentation.
QHIVE-4829: Qubole has added support for the Surrogate Keys function in Hive 3.1.1 (beta). For more details, see HIVE-20536.
QHIVE-4834: Qubole has backported open-source fix for the vectorized limit operator returning the incorrect number of results with offset. Related open-source jira: HIVE-22164.
QHIVE-4856: In Tez, you can use
hive-exec jarthat is locally available on cluster nodes. This reduces the overhead of localization. It increases the efficiency by avoiding additional HDFS operations.
QHIVE-4966: To reduce AWS read API calls in Hive 3.1.1 (beta), Qubole has changed default values of the following configuration properties:
QHIVE-4873: Qubole has backported open-source fixes to avoid the issue where Hive queries with JOIN condition with
Related open-source Hive jira issues:
QTEZ-473: Qubole has optimized Tez-0.9.1 UI to run faster (OSS TEZ-4085).
- QHIVE-4807: Fixed an error case in MapJoin conversion when no table is selected as a big table (OSS HIVE-22201).
- QHIVE-4839: Fixed an issue with Hive statistics auto-gather feature that occurred during a multi
- QHIVE-4849: Qubole has changed timezone in the Tez UI to UTC and the time format to
D days, H hours. This eliminates differences in the time-format and the timezone between ResourceManager and Tez.
- QHIVE-4885: The ORC filename is printed along with the error when there is
PostScriptof an ORC file to help you to inspect and ensure that file has a valid
- QHIVE-4925: Qubole has upgraded the
commons-lang3version to 3.4. This fixes an issue which caused Hive queries to fail with the
- QHIVE-4978: Fixed the issue when the number of Auto Statistics commands running was more than the limit set for the account.
- QHIVE-4996: Fixed the issue when the Auto Statistics command was not triggered in
- QHIVE-5010: Fixed the issue when the Auto Statistics command did not get triggered if two accounts contain the same cluster tag for the maintenance cluster.
For a list of bug fixes between versions R57 and R58, see Changelog for api.qubole.com.