Globally Rolled Out Features/Enhancements

Qubole enables certain features/enhancements as part of its gradual rollout program to different pods over a period of time. After Qubole rolls out such features/enhancements, it globally enables them on the Qubole platform.

The following table provides the list of features that are globally rolled out for you to use.

Feature/Enhancement

Feature Description

Qubole Component

Supported Cloud Provider

QDS Release Version

Auto-population of instances similar to primary worker node type

Enhancement in the heterogeneous cluster configuration UI that suggests instances similar to the chosen worker node type but from different generations instead of suggesting the instance of double weight of the same generation.

Cluster Management

AWS

R58 Quick Fix

Optimized version of the Beeline script

It is an optimization that reduces the latency of HiveServer2 queries.

Hive

AWS

R58

Hive MapJoin Counters Computation

Support of counters that compute the number of joins in Hive, which are converted to MapJoin after a query completion. The query results are visible in the Analyze/Workbench logs.

Hive

AWS

R58

Use of hive-exec Libraries in Tez from the local disk

It is a Tez optimization that allows using the hive-exec jar, which is locally available on cluster nodes. This reduces the localization overhead and increases efficiency by avoiding additional HDFS operations.

Hive Tez

AWS, Azure, GCP, OPC, and Oracle

R58

HiveServer2 cluster with private IP address

HiveServer2 clusters use private-IP for the inter-process communication.

Hive

AWS

R58

Cleanup of the Partial Data upon a Hive Query Failure

In case of a Hive query failure, Qubole cleans up the partial data that completed mappers/reducers write.

Hive

AWS

R57 Quick Fix

Hive Metastore Server with Java 8

Use Java 8 along with G1GC (garbage collector) for the thrift Hive Metastore Server (HMS) JVM. To use this feature, remove any bootstrap code related to Java 8 for HMS. There is no need to restart HMS JVM for Java 8 to be effective.

Hive

AWS

R57

Spot Node Loss and Spot Blocks using graceful Decommissioning

Spark applications handle Spot Node Loss and Spot-blocks using YARN status of Graceful-Decommission. This is supported on Spark versions 2.4.0 and later versions.

Spark

AWS,GCP

R57

Private IP usage

Private IP addresses are used for all nodes in Spark. As a result of which the executor logs are accessible.

Spark

AWS

R56 Quick Fix

Direct Writes for Dynamic partition overwrite in Datasource flow

Support of direct writes for improving performance for data source tables and when OSS flag spark.sql.sources.partitionOverwriteMode is set to dynamic. It is supported from spark version 2.4 and later versions.

Spark

AWS

R57

Distributed Writes for better performance

Users can run SQL commands with large result size using Spark. It is supported from spark version 2.4 and later versions.

Spark

AWS

R57

Improved Container Packing for efficient cluster utilization

Spark on Qubole improves container packing; by restarting idle executors and thus allowing YARN to move restarted executors to fewer nodes.

Spark

AWS

R56

Direct Writes for Insert Overwrite with dynamic partitions queries

Support of direct writes for improving performance for Insert Overwrite with dynamic partitions queries. It is supported from spark version 2.2 and later versions.

Spark

AWS

R56