Spark

Note

Spark 2.3-latest is now set to Spark 2.3.2 in the QDS UI. Spark clusters running 2.3-latest will run 2.3.2 after a cluster restart.

Qubole Job History Server Upgrade

SPAR-3053: The multi-tenant Qubole Job History Server has been upgraded to Spark 2.3 (2.3.1 by default). This server makes available the logs and history of Spark jobs that ran on clusters that have since been terminated.

Support for Hive Authorization Admin Commands

SPAR-2786: Spark on Qubole now supports Hive Admin commands to allow users to grant privileges such as SELECT, UPDATE, INSERT and DELETE to other users or roles. Via Support, Disabled.

The following commands are supported:

  • Set role
  • Grant privilege (SELECT, INSERT, DELETE, UPDATE or ALL)
  • Revoke privilege (SELECT, INSERT, DELETE, UPDATE or ALL)
  • Grant role
  • Revoke role
  • Show Grant
  • Show current roles
  • Show roles
  • Show role grant
  • Show principals for role.

Support for these commands is available in Spark 2.4 and later versions.

Improvements

  • SPAR-3003: Cluster images now include the PyArrow package to support Pandas UDFs, enabling performance improvements in Spark 2.3.1. This enhancement is available via Support and is disabled by default for Spark 2.3.1. It is enabled by default for Spark 2.4 and later versions.
  • SPAR-2649: You can now dynamically change min executors and max executors for a running Spark application from the Executors tab of the Spark Application UI. This capability is supported in Spark 2.3.1 and later versions.

Bug Fix

  • SPAR-3059: Fixes the following problem with native Optimized Row Columnar (ORC) with DirectFileOutputCommitter: if a task failed after writing partial files, the re-attempt also failed with FileAlreadyExistsException and the job failed. Fixed in Spark 2.4.