Spark

Enhancements

  • SPAR-2805: Spark on Qubole supports UDF parallelization using systemd. This reduces the time taken to bring up Spark and the required daemons on a Spark cluster after the nodes are provisioned.
  • SPAR-3317: Secure per-user access to data stores is supported. Only users with the appropriate privileges can access the data stores. This feature is supported on Spark 2.3.2, 2.4.0, and later versions.
  • SPAR-2164: Handling skew in the join keys is supported. Users can now specify the hint ` /*+ SKEW ('<table_name>') */ ` for a join that describes the column and the values upon which skew is expected.
  • SPAR-3480: Handling of limit 0 queries is improved; unnecessary jobs are detected and pruned.
  • SPAR-3300: When the spark.add_sparklens_jar flag is enabled, you can use Sparklens with QDS Spark commands without having to pass the Sparklens jars option externally.
  • SPAR-3005: Fixes a problem in the Spark History Server that was causing high CPU utilization in the web application.
  • SPAR-3032: Cost-Based Optimization (CBO) is enabled by default in Spark 2.4.0 to collect statistics automatically whenever the data in the underlying table changes. Statistics are also collected by default for various DDL statements.