Hive

The new features and enhancements are:

Other enhancements and bug fixes are listed in:

Hive 1.2 is Deprecated

QHIVE-5285: Hive 1.2 has been deprecated on Hadoop 2 clusters.

Upgraded Qubole-managed Hive Metastore DB

Qubole has upgraded the Qubole-managed Hive metastore DB’s schema to Hive 2.3. Qubole recommends its users to upgrade their custom metastore to 2.3 or later versions.

Robust and Performant Hive 3.1.1 (beta)

Hive version 3.1.1 (beta) is now more robust and performant. Qubole has backported over 200 changes from the open-source Hive. Backported Issues from Open-source Hive provides the complete list of backported issues.

Support for Tez 0.9

QHIVE-4872: You can now run Hive version 2.3 with Tez version 0.9.1. Feature to opt in

Note

It works only when HiveServer2 is enabled and Java8 is used for running Hadoop.

Upgraded Hive 2.3

QHIVE-4832: Qubole’s Hive version 2.3 is now at par with open-source Hive version 2.3.6. The following bug fixes/improvements are backported from open-source Hive:

Enhancements

  • QHIVE-2626: In cluster’s bash command hive now points to the executable path of Hive. It launches Hive CLI.

  • QHIVE-4214: Hive can now read a collection of primitive type parquet data generated using thrift with vectorization enabled. The related open-source jira is HIVE-21492.

  • QHIVE-5049 and QHIVE-5286: Qubole has optimized the loading time of dynamically created partitions in the Hive Metastore. The configuration property is hive.qubole.optimize.dynpart.listing. Gradual Rollout | Cluster Restart Required

    In addition, Qubole has removed hive.qubole.dynpart.track.s3 and hive.qubole.dynpart.track.cloudfs, which are obsolete S3 eventual consistency configuration properties.

  • QHIVE-5058: Qubole has back ported open-source HIVE-12490 to fix an issue when a direct SQL query to Hive Metastore fails when the " (double quotes) character is used in the identifier’s name.

  • QHIVE-5339: Hive Metastore Server now runs using Java 8 runtime by default.

  • QHIVE-5242: Hive queries that use Hive 2.1.1 or later versions and running on the coordinator node get the corresponding query-based Hive logs uploaded to <defloc>/logs/query_logs/hive/<cluster_inst_id>/<cmdId>.log.gz.

  • QHIVE-5268: Qubole now supports configuring the replication of Application Timeline Server (ATS) v1.5 HDFS Timeline data. The default replication is set to 2. You can override the default value using config yarn.timeline-service.entity-group-fs-store.replication through Hadoop Overrides in the cluster’s Advanced Configuration UI page.

  • QHIVE-5281: The vectorized-query execution in Hive now stops evaluating empty groups by keys. The related open-source Jira is HIVE-16533.

  • QTEZ-477: Qubole now supports using RollingLevelDBTimelineStore for ATS v1.5 Summary Store. Gradual Rollout

  • QTEZ-509: Qubole has eliminated the unresponsiveness of ATS v1.5 in syncing the Summary Store with a new application from HDFS. This used to occur if some application’s processing took a very long time, which can cause threads to keep processing the same application. This resulted in creating a backlog of new applications to process.

  • QTEZ-512: Redirection of a required log file within a container from NodeManager to the Job History Server (JHS) is enhanced. The JHS now receives the required log file as opposed to all log files in a container earlier. The related open-source Hive Jiras are YARN-2605, YARN-3654, YARN-5246, and YARN-4990.

Bug Fixes

  • QHIVE-4932: Queries continue to work when Hive bootstrap is moved to Glacier and such queries are inaccessible.
  • QHIVE-4456: Hive queries fail if there is an error in the Hive bootstrap if the version is Hive 2.1.1 or later versions. Feature to opt in
  • QHIVE-4967: As open-source Hive has deprecated hive.mapred.mode, use hive.strict.checks.* configuration properties instead. Qubole has removed qubole.compatibility.mode which was added to throw an error when hive.mapred.mode is set to strict.
  • QHIVE-5064: Open-source Hive has set the ORC file format to be case sensitive from Hive version 2.3. So, if the ORC column schema contains any case sensitive characters, Hive cannot read it. To skip the case-sensitive property, add set orc.schema.evolution.case.sensitive = false; in the query.
  • QHIVE-5071: The issue where recovering partitions from the Hive metastore was hung is resolved. To fix this, Qubole has optimized the Alter Table <tablename> recover partitions query. As part of the fix, the recover partitions use direct SQL supported APIs to fetch partitions from the Hive metastore. Gradual Rollout
  • QHIVE-5176: The issue where Hive CLI did not split the command by semicolon properly when quotes are within the string is resolved. The related open-source Hive Jira is HIVE-19948.
  • QHIVE-5247: The offline Tez UI was inaccessible for a multi-instance HiveServer2 cluster. Qubole now supports the offline Tez UI for a multi-instance HiveServer2 cluster.
  • QHIVE-5273: The issue which caused Hive 2.3 queries to fail with NullPointerException due to incorrect handling of NULL values in IF Statements, Comparisons, and so on is fixed. The related open-source Hive Jira is HIVE-18622.
  • QHIVE-5274: The vectorization in the parquet file format for nested complex data types is disabled as it causes query failures.
  • QHIVE-5299: The issue where Resource Manager could not upload the ATS timeline data during the cluster termination is resolved. The issue had resulted in displaying missing information on the Tez UI.
  • QTEZ-439: The issue where a Hive query that had UNION ALL and lateral view ran with a wrong result is resolved. The related open-source Hive Jira is HIVE-21660.
  • QTEZ-498: Qubole has backported open-source TEZ-3413 to fix an issue where AppLaunchEvent for an application was not sent to ATS v1.5, which can cause the data for the application to be deleted from ATS.

For a list of bug fixes between versions R58 and R59, see Changelog for api.qubole.com.