Spark Structured Streaming

Qubole introduces Spark Structured Streaming in this release.

New Features

Comprehensive Support for Kinesis Connector in Structured Streaming

  • SPAR-2189 and SPAR-2802: You can add Kinesis as a data source provider in your streaming queries for micro-batch streaming and continuous streaming modes.
  • SPAR-2754: You can create a streaming application which can write output to a Kinesis stream by using kinesis as a valid sink format.
  • SPAR-2729: IAM roles are added in Kinesis Connector for structured streaming. You do not have to specify the credentials when writing streaming applications. Your account must be enabled for IAM roles and the permissions to read Kinesis stream to the attached IAM role must be granted.

Structured Streaming Query in a Spark Data Source Table

SPAR-3001: You can write a structured streaming query, which can append the data to a table, and you can read the updated table in real-time.

Streaming Query Progress Graphs in Notebooks

ZEP-2650: You can easily monitor the streaming query progress by viewing the streaming query progress graphs in notebooks. When you start a streaming query in a notebook paragraph, the monitoring graph is displayed in the same paragraph. Via Support Disabled

Direct Writes for Checkpointing

SPAR-2871: Direct writes based outputstream is used for s3, which prevents EC issues that might occur when S3 path is used in the checkpointing location. Via Support Disabled