The new features and enhancements are:
- Hive 1.2 is Deprecated
- Upgraded Qubole-managed Hive Metastore DB
- Robust and Performant Hive 3.1.1 (beta)
- Support for Tez 0.9
- Upgraded Hive 2.3
Other enhancements and bug fixes are listed in:
Hive 1.2 is Deprecated¶
QHIVE-5285: Hive 1.2 has been deprecated on Hadoop 2 clusters.
Upgraded Qubole-managed Hive Metastore DB¶
Qubole has upgraded the Qubole-managed Hive metastore DB’s schema to Hive 2.3. Qubole recommends its users to upgrade their custom metastore to 2.3 or later versions.
Robust and Performant Hive 3.1.1 (beta)¶
Hive version 3.1.1 (beta) is now more robust and performant. Qubole has backported over 200 changes from the open-source Hive. Backported Issues from Open-source Hive provides the complete list of backported issues.
Support for Tez 0.9¶
QHIVE-4872: You can now run Hive version 2.3 with Tez version 0.9.1. Feature to opt in
It works only when HiveServer2 is enabled and Java8 is used for running Hadoop.
Upgraded Hive 2.3¶
QHIVE-4832: Qubole’s Hive version 2.3 is now at par with open-source Hive version 2.3.6. The following bug fixes/improvements are backported from open-source Hive:
QHIVE-2626: In cluster’s bash command
hivenow points to the executable path of Hive. It launches Hive CLI.
QHIVE-4214: Hive can now read a collection of primitive type parquet data generated using thrift with vectorization enabled. The related open-source jira is HIVE-21492.
QHIVE-5049 and QHIVE-5286: Qubole has optimized the loading time of dynamically created partitions in the Hive Metastore. The configuration property is
hive.qubole.optimize.dynpart.listing. Gradual Rollout | Cluster Restart Required
In addition, Qubole has removed
hive.qubole.dynpart.track.cloudfs, which are obsolete S3 eventual consistency configuration properties.
QHIVE-5058: Qubole has back ported open-source HIVE-12490 to fix an issue when a direct SQL query to Hive Metastore fails when the
"(double quotes) character is used in the identifier’s name.
QHIVE-5339: Hive Metastore Server now runs using Java 8 runtime by default.
QHIVE-5242: Hive queries that use Hive 2.1.1 or later versions and running on the coordinator node get the corresponding query-based Hive logs uploaded to
QHIVE-5268: Qubole now supports configuring the replication of Application Timeline Server (ATS) v1.5 HDFS Timeline data. The default replication is set to 2. You can override the default value using
config yarn.timeline-service.entity-group-fs-store.replicationthrough Hadoop Overrides in the cluster’s Advanced Configuration UI page.
QHIVE-5281: The vectorized-query execution in Hive now stops evaluating empty groups by keys. The related open-source Jira is HIVE-16533.
QTEZ-477: Qubole now supports using
RollingLevelDBTimelineStorefor ATS v1.5 Summary Store. Gradual Rollout
QTEZ-509: Qubole has eliminated the unresponsiveness of ATS v1.5 in syncing the Summary Store with a new application from HDFS. This used to occur if some application’s processing took a very long time, which can cause threads to keep processing the same application. This resulted in creating a backlog of new applications to process.
QTEZ-512: Redirection of a required log file within a container from NodeManager to the Job History Server (JHS) is enhanced. The JHS now receives the required log file as opposed to all log files in a container earlier. The related open-source Hive Jiras are YARN-2605, YARN-3654, YARN-5246, and YARN-4990.
- QHIVE-4932: Queries continue to work when Hive bootstrap is moved to Glacier and such queries are inaccessible.
- QHIVE-4456: Hive queries fail if there is an error in the Hive bootstrap if the version is Hive 2.1.1 or later versions. Feature to opt in
- QHIVE-4967: As open-source Hive has deprecated
hive.strict.checks.*configuration properties instead. Qubole has removed
qubole.compatibility.modewhich was added to throw an error when
hive.mapred.modeis set to
- QHIVE-5064: Open-source Hive has set the ORC file format to be case sensitive from Hive version 2.3. So, if the
ORC column schema contains any case sensitive characters, Hive cannot read it. To skip the case-sensitive property,
set orc.schema.evolution.case.sensitive = false;in the query.
- QHIVE-5071: The issue where recovering partitions from the Hive metastore was hung is resolved. To fix this, Qubole
has optimized the
Alter Table <tablename> recover partitionsquery. As part of the fix, the recover partitions use direct SQL supported APIs to fetch partitions from the Hive metastore. Gradual Rollout
- QHIVE-5176: The issue where Hive CLI did not split the command by semicolon properly when quotes are within the string is resolved. The related open-source Hive Jira is HIVE-19948.
- QHIVE-5247: The offline Tez UI was inaccessible for a multi-instance HiveServer2 cluster. Qubole now supports the offline Tez UI for a multi-instance HiveServer2 cluster.
- QHIVE-5273: The issue which caused Hive 2.3 queries to fail with NullPointerException due to incorrect handling
of NULL values in
Comparisons, and so on is fixed. The related open-source Hive Jira is HIVE-18622.
- QHIVE-5274: The vectorization in the parquet file format for nested complex data types is disabled as it causes query failures.
- QHIVE-5299: The issue where Resource Manager could not upload the ATS timeline data during the cluster termination is resolved. The issue had resulted in displaying missing information on the Tez UI.
- QTEZ-439: The issue where a Hive query that had
UNION ALLand lateral view ran with a wrong result is resolved. The related open-source Hive Jira is HIVE-21660.
- QTEZ-498: Qubole has backported open-source TEZ-3413 to fix
an issue where
AppLaunchEventfor an application was not sent to ATS v1.5, which can cause the data for the application to be deleted from ATS.
For a list of bug fixes between versions R58 and R59, see Changelog for api.qubole.com.