Hive
The new features and enhancements are:
Other enhancements and bug fixes are listed in:
AWS Glue is Supported as a Hive Metastore
QHIVE-4160: Qubole supports using AWS Glue as the primary metastore in Hive. It is only supported in Hive 2.3 with HiveServer2 and Hive-on-coordinator. Via Support
For more information, see the documentation.
AWS Glue Catalog Sync
QHIVE-3967: Qubole supports using the AWS Glue sync agent with QDS clusters to sync Hive Metastore with AWS Glue. Currently, it is supported through a node bootstrap. Via Support
For more information, see the documentation.
Multi-instance HiveServer2
QHIVE-3896: Qubole allows you to configure multi-instance HiveServer2 (HS2) in an Hadoop 2 (Hive) cluster. Beta, Via Support
It provides high availability and scalability when you are running concurrent queries. To run multi-instances of HS2, you can configure a HiveServer2 cluster as a child to the parent Hadoop 2 (Hive) cluster. It is recommended to use the multi-instance HS2 cluster when you want to run more than 100 concurrent Hive queries.
Qubole allows you to configure a separate HiveServer2 cluster through the UI and the API as well. Cluster Restart Required
For more information, see UI documentation and API documentation.
Hive 3.1.1 (beta) Version
QHIVE-3844: Qubole supports Hive 3.1.1 (beta). It is also the latest release in the open-source Hive. Beta, Via Support
Hive 3.1.1 (beta) requires Hadoop 3.x and Tez 0.9.x and it supports HS2 and multi-instance HS2. Tez is the only supported execution engine. Cluster Restart Required
YARN ATS Version 1.5 for Tez
QTEZ-385: YARN ATS version 1.5 for Tez is supported only in Hive versions 2.1.1 and 2.3.1 (beta). Qubole provides you an option to choose ATS v1.5 over the default ATS v1 as it provides more scalability. If you are running too many concurrent queries using Tez or if you are facing issues with the Tez UI, then you can consider switching to ATS v1.5. Via Support
Pig 0.17 Supported with Hive 2.1.1
QPIG-101: Pig version 0.17 is now only supported with Hive version 2.1.1 and Tez version 0.8.4. Cluster Restart Required
Enhancements
QHIVE-2357: To optimize alter table and recover partitions when the number of different partitions is huge, Qubole uses the s3prefix listing to speedup the partitions’ listing. You can enable this enhancement with these configurations:
set hive.qubole.filestatus.recurse=true;
set hive.qubole.use.s3prefix.for.recover.partitions=true;
QHIVE-4065: Qubole tmp tables can be dropped faster if you enable the
hive.qubole.optimize.drop.tmp.tables
configuration parameter.
Bug Fixes
QHIVE-3767: Fixed an issue causing queries running on Tez using the DynamoDb connector to fail with
java.lang.NoSuchMethodError
.QHIVE-3948: A warning is thrown during a table’s creation if a column type’s length is greater than the default limit (4000 characters) in the metastore.
HIVE-4104: It is a fix for
ReduceRecordSource
adding the batch data as a string to the exception stack which can lead to a hung query when there is a failure in Hive 2.1.1 version when Tez and vectorization is enabled.QHIVE-4192: Fixed an issue that caused NPE when
UpdateInputAccessTimeHook
is used on query accessing non-current database. A related open-source issue is HIVE-18060.QTEZ-362: To reduce AWS read API calls in Hive 2.3, Qubole has changed the default values of following configurations:
mapred.min.split.size
: Its default value is 256MB.mapred.max.split.size
: Its default value is 256MB.
The default values for other Hive versions remain unchanged.
For a list of bug fixes between versions R55 and R56, see Changelog for api.qubole.com.