Spark
New Features
SPAR-3510: QDS now supports Apache Spark 2.4.3. It is displayed as 2.4 latest (2.4.3) in the Spark Version field of the Create New Cluster page in the QDS UI. All existing 2.4.0 clusters are automatically upgraded to 2.4.3 in accordance with Qubole Spark versioning policy.
SPAR-2937: You can configure Ranger policies for Hive tables, and these are honored by Spark SQL for authorization. Supported on Spark 2.4.0 and later versions. Beta, Via Support.
Enhancements
SPAR-3650: Spark now computes the size of the input table during query planning. This speeds up queries that invlove joins using BroadcastHashJoin. Supported on Spark 2.4.0 and later versions. Via Support.
SPAR-3616: Allows Spark applications to run reliably even in Out-of-Memory (OOM) cases. This capability can be enabled in Spark 2.4.3 and later versions. Via Support.
SPAR-3418: Implements ORC metadata caching in Spark. This improves query performance by reducing the time spent on reading ORC metadata from an object store. Supported on Spark 2.4.3 and later versions. Via Support.
Bug Fixes
SPAR-3701: Fixes a problem that caused query runtimes in some TPCDS queries to increase because filter pushdown in subqueries prevented subquery reuse.
SPAR-3405: Fixes a problem that prevented Hive configurations such as
hive.metastore.uris
from reaching the Spark Hive Authorizer plugin when they were passed through Spark defaults or-–conf
. This caused errors in connecting to the Hive Metastore when Hive Authorization was enabled. Fixed in Spark 2.4.0 and later versions. Via Support.SPAR-3766: Fixes a problem that caused the owner of a table to be changed to the user running the command during operations such as update Table Stats. The original owner of the table is now retained. Fixed in Spark 2.4.0 and later versions.