What is Spark Structured Streaming?

Spark Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine. Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing without the user having to reason about streaming. You can express your streaming computation the same way you would express a batch computation on static data. You can use the Dataset/DataFrame API in Scala, Java, Python or R to express streaming aggregations, event-time windows, stream-to-batch joins, etc. The Spark SQL engine runs it incrementally and continuously, and updates the final result as streaming data continues to arrive. The computation is executed on the same optimized Spark SQL engine.

Note

Kinesis and Kafka client jars are available in Qubole Spark as part of the basic package.

Running Spark Structured Streaming on QDS

You can run Spark Structured Streaming jobs on a Qubole Spark cluster from the Analyze and Notebooks page similar to any other Spark application. You can also run Spark Structured Streaming jobs by using the API. For more information, see Submit a Spark Command.

Note

QDS has a 36-hour time limit on every command run. For streaming applications this limit can be removed. For more information, contact Qubole Support.

Running the Job from the Analyze Page

  1. Navigate to the Analyze page.
  2. Click +Compose.
  3. Select Spark Command from the Command Type drop-down list.
  4. Select the required Spark language from the drop-down list. By default, Scala is selected.
  5. Select Query Statement or Query Path.
  6. Compose the code and click Run to execute.

For more information on composing a Spark command from the Analyze page, see Composing Spark Commands in Different Spark Languages through the UI.

Running the Job from the Notebooks Page

  1. Navigate to the Notebooks page.
  2. Start your Spark cluster.
  3. Compose your paragraphs and click the Run icon for each of these paragraphs in contextual order.

Sample program on the Notebooks page.

You can easily monitor the streaming query progress by viewing the streaming query progress graphs in notebooks. When you start a streaming query in a notebook paragraph, the monitoring graph is displayed in the same paragraph.

Note

Streaming query progress graphs feature is not enabled for all users by default. Create a ticket with Qubole Support to enable this feature on the QDS account.