Hive

The new features and enhancements are:

Other enhancements and bug fixes are listed in:

Changes in Hive Versions

Hive version 2.3 is generally available. Cluster Restart Required

QHIVE-4645: QDS Hive 2.3 is updated with all changes until Apache Hive 2.3.5 but continues to use orc v1.3.3. Read more about Apache Hive 2.3.5.

Deprecation of Hive Notebooks

QHIVE-4527: Qubole will deprecate Hive notebooks in this QDS version.

Deprecation of Qubole JDBC Storage Handler

With the R57 version, Qubole deprecates Qubole-Hive JDBC Storage Handler and it will use OSS JDBC Storage Handler across all Hive versions.

Hadoop 2 (Hive) Clusters Renamed as Hadoop (Hive) Clusters

ACM-4221 and ACM-5016: Qubole supports Hive 3.1.1 (beta) on a Hive cluster. Starting cluster API v2.1, Hadoop 2 (Hive) clusters are renamed as Hadoop (Hive) clusters. You can set Hive 3.1.1 (beta) version while creating/editing a cluster. Via Support | Cluster Restart Required

Improvements in Hive Metastore Server

QHIVE-4151: Qubole has added support for using Java 8 along with G1GC for Thrift Hive Metastore Server (HMS) JVM. If you use this feature, remove any bootstrap code you may have related to using Java 8 for HMS. It is not required to restart HMS JVM to apply Java 8. Via Support

Hive ACID Transactions

Hive ACID Transactions support on data lake using Hive 3.1.1 (beta). For more information, see this blog.

QHIVE-4673: When you convert non-ACID table to an ACID table, OSS (HIVE-22004) has the restriction that the file names in the source table must strictly comply with the patterns which Hive uses to write. Qubole Hive 3.1.1 (beta) relaxes this restriction.

QHIVE-4622: Qubole provides you control over the ACID compaction’s cleaner process as opposed to OSS Hive. In Qubole Hive ACID, you can disable the cleaner process using following configurations:

  • Set metastore.compactor.force.disable.cleaner to true to disable the cleaner thread at HMS startup. Its default value is false.

  • Set NO_CLEANUP to false in the table properties to disable the table-level cleanup and prevent the cleaner process from automatically cleaning obsolete directories/files.

Enhancements

  • QHIVE-2335: The Command Error Log API now supports Hive queries.

  • QHIVE-3515: Qubole supports blacklisting and whitelisting Hive tables for automatic statistics collection.

  • QHIVE-3974: Qubole supports enabling/disabling Thrift Metastore Server Audit log in QDS clusters. Via Support

  • QHIVE-3998: Qubole has improved logs and error messages in case of failures while launching Tez ApplicationMaster.

  • QHIVE-4239: Qubole allows users to write to local FileSystem and add custom Jars/UDFs by default when the query runs on HiveServer2 or Hive-on-Coordinator mode.

  • QHIVE-4597: The following configurations are enabled by default in Hive 2.3 and 3.1.1 (beta) versions:

    • The configuration that enables Hive to use deterministic rand() to randomly distribute keys to reducers. This resolves the issue of incorrect query results when there is a spot loss or spot node that goes down due to any reason while the reduce tasks are in the process of fetching results from map tasks’ output.

    • The configuration that allows the memory allocation for reduce task to be based on the mapreduce.reduce.memory.mb configuration. Earlier, the memory allocation for both Map and reduce tasks is getting done based on mapreduce.map.memory.mb configuration.

    • The configuration that allows INSERT INTO command to not be supported on bucketed tables.

    • The configuration that improves the performance of S3 listing by using Prefix-based listing.

    • The configuration that optimizes the ALTER TABLE RECOVER PARTITIONS command

  • QTEZ-412: Qubole has added support for Application Timeline Server (ATS) v1.5 in Hive 3.1.1 (beta) to improve scalability over ATS v1.

  • QTEZ-441: Qubole has added an enhancement which when enabled adds an application tag containing the Qubole command ID to Tez jobs submitted through Hive-on-coordinator and QDS servers. This tag helps in killing run-away applications for which corresponding queries are killed. Via Support

Bug Fixes

  • QHIVE-2586: Qubole now disallows overriding hive.aux.jars.path as part of Hive override configuration in the cluster settings. Refer to Adding Custom Jars in Hive for details on how to add/remove custom jars. Modifying the value of hive.aux.jars.path configuration through the SET statement in Hive is also not allowed now. Cluster Restart Required

  • QHIVE-3822: This is a fix for the issue which caused automatic statistics collection to fail on a table that had a Hive Reserved Keyword as a column name.

  • QHIVE-3927: Fixed a security vulnerability by disallowing the use of an embedded element in XPathUtil UDF. Related open-source jira: HIVE-18789.

  • QHIVE-4530: Fixed an issue which caused CREATE TABLE queries containing non-alphanumeric column name to fail.

  • QHIVE-4588: The issue in which dropping a table failed due to foreign key data is resolved. Qubole has added a cascade option for deleting the foreign key data. Related open-source jira: HIVE-19994.

  • QTEZ-446: Qubole has added support in HiveServer2 to automatically kill Tez applications for which the query has finished/canceled/failed. Related open source jira: TEZ-3405.

For a list of bug fixes between versions R56 and R57, see Changelog for api.qubole.com.