Hive

The new features and enhancements are:

Other enhancements and bug fixes are listed in:

AWS Glue is Supported as a Hive Metastore

QHIVE-4160: Qubole supports using AWS Glue as the primary metastore in Hive. It is only supported in Hive 2.3 with HiveServer2 and Hive-on-Master. Via Support

For more information, see the documentation.

AWS Glue Catalog Sync

QHIVE-3967: Qubole supports using the AWS Glue sync agent with QDS clusters to sync Hive Metastore with AWS Glue. Currently, it is supported through a node bootstrap. Via Support

For more information, see the documentation.

Multi-instance HiveServer2

QHIVE-3896: Qubole allows you to configure multi-instance HiveServer2 (HS2) in an Hadoop 2 (Hive) cluster. Beta, Via Support

It provides high availability and scalability when you are running concurrent queries. To run multi-instances of HS2, you can configure a HiveServer2 cluster as a child to the parent Hadoop 2 (Hive) cluster. It is recommended to use the multi-instance HS2 cluster when you want to run more than 100 concurrent Hive queries.

Qubole allows you to configure a separate HiveServer2 cluster through the UI and the API as well. Cluster Restart Required

For more information, see UI documentation and API documentation.

Hive 3.1.1 (beta) Version

QHIVE-3844: Qubole supports Hive 3.1.1 (beta). It is also the latest release in the open-source Hive. Beta, Via Support

Hive 3.1.1 (beta) requires Hadoop 3.x and Tez 0.9.x and it supports HS2 and multi-instance HS2. Tez is the only supported execution engine. Cluster Restart Required

YARN ATS Version 1.5 for Tez

QTEZ-385: YARN ATS version 1.5 for Tez is supported only in Hive versions 2.1.1 and 2.3.1 (beta). Qubole provides you an option to choose ATS v1.5 over the default ATS v1 as it provides more scalability. If you are running too many concurrent queries using Tez or if you are facing issues with the Tez UI, then you can consider switching to ATS v1.5. Via Support

Pig 0.17 Supported with Hive 2.1.1

QPIG-101: Pig version 0.17 is now only supported with Hive version 2.1.1 and Tez version 0.8.4. Cluster Restart Required

Enhancements

  • QHIVE-2357: To optimize alter table and recover partitions when the number of different partitions is huge, Qubole uses the s3prefix listing to speedup the partitions’ listing. You can enable this enhancement with these configurations:
    • set hive.qubole.filestatus.recurse=true;
    • set hive.qubole.use.s3prefix.for.recover.partitions=true;
  • QHIVE-4065: Qubole tmp tables can be dropped faster if you enable the hive.qubole.optimize.drop.tmp.tables configuration parameter.

Bug Fixes

  • QHIVE-3767: Fixed an issue causing queries running on Tez using the DynamoDb connector to fail with java.lang.NoSuchMethodError.

  • QHIVE-3948: A warning is thrown during a table’s creation if a column type’s length is greater than the default limit (4000 characters) in the metastore.

  • HIVE-4104: It is a fix for ReduceRecordSource adding the batch data as a string to the exception stack which can lead to a hung query when there is a failure in Hive 2.1.1 version when Tez and vectorization is enabled.

  • QHIVE-4192: Fixed an issue that caused NPE when UpdateInputAccessTimeHook is used on query accessing non-current database. A related open-source issue is HIVE-18060.

  • QTEZ-362: To reduce AWS read API calls in Hive 2.3, Qubole has changed the default values of following configurations:

    • mapred.min.split.size: Its default value is 256MB.
    • mapred.max.split.size: Its default value is 256MB.

    The default values for other Hive versions remain unchanged.

For a list of bug fixes between versions R55 and R56, see Changelog for api.qubole.com.