Spark

Introducing Size-based JOIN Reordering

SPAR-3882: Spark on Qubole now supports auto-reordering of the JOINs based on the table sizes and can work even in the absence of column-level statistics. During benchmarking, Qubole found that the optimization is effective in 38 queries on TPC-DS standards and improves query performance by up to 85%.

Introducing Apache Spark 3.0

Significant features include Adaptive Query Execution, Dynamic Partition Pruning, and Disk-persisted RDD blocks served by shuffle service; substantial improvements in the Pandas API; up to 40X speed improvement in R user-defined functions. Scala 2.12 is generally available. This latest Spark version does not have Scala 2.11.