Spark

In this release, Spark on Qubole provides various new features, enhancements, and bug fixes.

New Features

  • SPAR-4020: Users can perform the Spark SQL UPDATE and DELETE operations on Hive ACID tables. Contact Qubole Support to enable this feature.
  • SPAR-4021: Users can write results of a streaming query to a Hive Acid Table. The API supports both InsertOnly and FullAcid Tables, and the supported output mode is Append. Contact Qubole Support to enable this feature.
  • SPAR-4030: Adaptive Query Execution is now supported on Spark 2.4.3 and later versions, with which query execution is optimized at the runtime based on the runtime statistics. Gradual Rollout.
  • SPAR-3979: Using the Dynamic Filtering values, Dynamic Partition Pruning selects the specific partitions within the table that needs to be read at runtime. This improves the job performance for the queries where the join condition is on the partitioned column, by significantly reducing the amount of data read and processed. This feature is available in Spark 2.4.3 and later versions. Gradual Rollout.

Enhancements

  • SPAR-3519: The Spark Redshift Connector is improved with the following enhancements:

    • Handling null or empty column values for all supported data types.
    • Handling escape characters in WHERE clause of SELECT and COUNT (*) queries.
    • The AWS Redshift JDBC jar version is upgraded from com.amazon.redshift.jdbc42-1.2.16.1023 to com.amazon.redshift.jdbc42-1.2.36.1060.

    This enhancement is supported on Spark 2.4.3 and later versions.

  • SPAR-4259: For Spark clusters, the Snowflake JDBC jar version is upgraded from 3.8.5 to 3.9.2. This upgrade is supported on Spark 2.3.2, 2.4.3 and later versions.

Bug Fixes

  • SPAR-4208: Spark SQL queries were failing with the Amazon S3; Status Code: 403; Error Code: SignatureDoesNotMatch; error message when the internet proxy was set. This issue is fixed in Spark 2.4.3 and later versions.
  • SPAR-3942: Metadata commands, such as show tables, show databases, and use databases were not honoring ranger policies. This issue is fixed in Spark 2.4.3 and later versions.
  • SPAR-4125: When executors were lost due to spot loss, they were incorrectly marked as executor loss due to OOM. This issue is fixed in Spark 2.4.3 version and later versions.

For a list of bug fixes between versions R58 and R59, see Changelog for api.qubole.com.