Spark

New Features

  • SPAR-3510: QDS now supports Apache Spark 2.4.3. It is displayed as 2.4 latest (2.4.3) in the Spark Version field of the Create New Cluster page in the QDS UI. All existing 2.4.0 clusters are automatically upgraded to 2.4.3 in accordance with Qubole Spark versioning policy.
  • SPAR-2937: You can configure Ranger policies for Hive tables, and these are honored by Spark SQL for authorization. Supported on Spark 2.4.0 and later versions. Beta, Via Support.

Enhancements

  • SPAR-3650: Spark now computes the size of the input table during query planning. This speeds up queries that invlove joins using BroadcastHashJoin. Supported on Spark 2.4.0 and later versions. Via Support.
  • SPAR-3616: Allows Spark applications to run reliably even in Out-of-Memory (OOM) cases. This capability can be enabled in Spark 2.4.3 and later versions. Via Support.
  • SPAR-3418: Implements ORC metadata caching in Spark. This improves query performance by reducing the time spent on reading ORC metadata from an object store. Supported on Spark 2.4.3 and later versions. Via Support.
  • SPAR-3555: The appendToTable API now supports Hive tables as well as Spark data sources.

Bug Fixes

  • SPAR-3701: Fixes a problem that caused query runtimes in some TPCDS queries to increase because filter pushdown in subqueries prevented subquery reuse.
  • SPAR-3405: Fixes a problem that prevented Hive configurations such as hive.metastore.uris from reaching the Spark Hive Authorizer plugin when they were passed through Spark defaults or -–conf. This caused errors in connecting to the Hive Metastore when Hive Authorization was enabled. Fixed in Spark 2.4.0 and later versions. Via Support.
  • SPAR-3766: Fixes a problem that caused the owner of a table to be changed to the user running the command during operations such as update Table Stats. The original owner of the table is now retained. Fixed in Spark 2.4.0 and later versions.