Hive

Bug Fixes

  • HIVE-1969: The MapJoin/SkewJoin issue due to which queries took a longer time than expected.

  • HIVE-2338: The Null Pointer exception did not give a descriptive message for the query failures that involved data writes at a base bucket location.

    As a resolution, QDS throws a descriptive illegal argument exception instead of the Null Pointer exception for such query failures.

  • HIVE-2707: Counters for FAILED queries were null.

    In addition to the counters printed for the SUCCESSFUL queries, counters for the FAILED queries are printed now.

  • HIVE-2865: In accounts in which Hive Authorization is enabled, QDS adds the configuration parameter hive.security.authorization.enabled to Hive’s Restricted List to prevent users from bypassing Hive Authorization when they run a query. You can change the setting at the cluster level in the cluster’s Hive Settings > Override Hive Configuration field under the Advanced Configuration tab. To enable Hive Authorization in a QDS account, contact Qubole Support. It is now supported in Hive 2.1 (in addition to Hive 1.2).

  • HIVE-2907: The logical query optimization phase is being slow while getting predicates using HiveRelMdPredicates if there are many equivalent columns and CBO is enabled.

  • HIVE-3179: A memory leak issue with the UDFClassLoader and ClassLoaderResolverImpl objects on HiveServer2.

  • HIVE-3185: Hive commands used Tez as the execution engine even when MapReduce was configured.

    As a resolution, the Hive execution engine that is added as the cluster override will take precedence over the Hive execution engine set in the account settings.

  • HIVE-3197: To avoid getting StackOverFlowError when there are huge partitions to drop, a new configuration parameter, hive.metastore.drop.partitions.batch.size has been introduced to drop partitions in batches.

    A user has to pass the batch size to hive.metastore.drop.partitions.batch.size (at the cluster/query level or in a Hive bootstrap) to drop the partitions in batches. The default value for this parameter is set to 0, so this parameter does not have any effect unless a value is specified.

  • HIVE-3269: Enabling hive.optimize.skewjoin resulted in the job’s failure with the FNFException.

  • HIVE-3271: Failure in Hive vectorization. Handling the NullPointerException in VectorUDFWeekOfYearString.

  • HIVE-3298: The Tez query failing with the No work found for tablescan error when the dynamic partition pruning is enabled.

  • HIVE-3319: All hive.cli.* parameters have been added to the list of whitelisted parameters. You can configure these parameters at runtime and it is not required to add these parameters to hive.security.authorization.sqlstd.confwhitelist when Hive Authorization and HiveServer2 are enabled.

  • HIVE-3402: The ClassNotFoundException due to the Kryo’s classloader that is set only once during the initialization.

  • HIVE-3484: QDS disallows the hive.on.master configuration in Hadoop Overrides.

  • QTEZ-313: The deadlock in ApplicationMaster is resolved by removing the calls from the task attempt to the task. The task passes the location hint and task spec to the TaskAttempt constructor.

  • QTEZ-315: The Hive query with UNION ALL failed when Tez is set as the execution engine.

  • QTEZ-330: Parallel Hive queries on Hive 2.1.1, TEZ, and Hive-on-coordinator on a non-HiveServer2 cluster failed intermittently.

    To resolve this, Hive supports parallel INSERT INTO values from the same session in the Hive version 2.1. The Hive session ID will be generated randomly for each query, which will avoid race conditions in the session directories.

Enhancements

  • HIVE-2515: The HS2 health status is available through the Datadog monitoring service. Beta, Via Support

  • HIVE-2584: Qubole encrypts the Hive metastore passwords. Beta, Via Support

  • HIVE-3174: Complex expressions are supported in OUTER JOINs by extending column pruner to account for residual filter expression in the JOIN operator.

  • HIVE-3193: In a SELECT query, Hive checks and waits until the files written to the S3 location are visible to consider the S3 eventual consistency. Disabled

  • HIVE-3220: The Hive 2.1.1 version can now support multiline comments within the query expressions.

  • HIVE-3275: A Datadog dashboard for Hive Metastore Server (HMS) is added for Hive, Spark, and Presto clusters. An alert on the HMS Memory usage is also added. Beta, Via Support

  • HIVE-3276: Liveness and Health Checks for the Hive Metastore Server (HMS) are added in Datadog as follows: Beta, Via Support

    • Liveness: Alert if HMS process is not available.

    • Health Check: Run a sample command and check if the services are responding within a given timeout/SLA. Otherwise, create an alert.

  • HIVE-3347: The parquet file format is added with hive.default.fileformat.

  • HIVE-3417: The metastore consistency check (MSCK) result is displayed only in Logs instead of the Results tab of the Analyze UI when the configuration parameter, hive.qubole.write.msck.result.to.log is enabled at the query/cluster level or in a Hive bootstrap. Cluster Restart Required ⎼ for the cluster-level setting.

  • HIVE-3434: The AvroSerDe’s InstanceCache is now thread safe. It avoids NullPointerException when the InstanceCache is accessed by multiple threads simultaneously.

  • QTEZ-217: When Tez is the execution engine in Hive queries, QDS provides an account-level configuration to limit the number of AWS API calls. Beta, Via Support

  • QTEZ-244: QDS has added the Datadog metrics for the Application History Server. Beta, Via Support