Managing Streaming Pipelines¶
After building streaming pipelines, you can efficiently manage your pipelines from the Pipelines UI.
You must ensure that the Pipelines resource is allowed for your role.
Checkpoint Management in S3¶
Spark on Qubole Structured Streaming solution provides Direct Write Checkpointing feature to manage checkpoint in object stores. With this feature, checkpoint data is directly written in to the final file. After the data write is complete, the associated output stream is closed. S3 ensures that a file is visible only when the output stream is properly closed. As a result, the rename operation is avoided and there are no consistency issues. Additionally, this method is efficient because copying the entire content from one file to another file is avoided.
For more information, see Reliable Structured Streaming on the Cloud with Direct Write Checkpointing
Retry on Failure¶
Auto-retries are added for Spark streaming applications. With this feature, the pipeline is run with auto retries with exponential backoff if it fails due to intermittent errors.
Rolling and Aggregation of Spark Logs¶
The logs for running applications are rolled and aggregated periodically into the remote storage S3, to prevent hard disk space issues and to retain logs for future in case the cluster goes down.
To debug the streaming pipelines, you can access the logs of the streaming pipeline from the Analyze page.
- From the Pipelines UI, click Instance & Logs.
- Click on the required command id. The Analyze page with the streaming pipeline code opens in a separate tab.
- In the Analyze page, click Logs or Resources tab.
- Click Application UI or Spark Application UI to view the logs.
After creating and running the streaming pipelines, you can manage these pipelines by performing various tasks on them in the Pipelines view.
For better management, the Pipelines view classifies a big monolithic task as concepts such as Pipelines, Instances of a Pipeline, and State of each instance.
The Pipelines view has the following visual components:
- History or list view of all the pipelines in the left pane.
- Filter option to view the pipelines in the draft and archive state in the left pane.
- Edit, Clone, Archive, and Delete options on the … menu of a pipeline in the left pane.
- Start, Pause, Edit, and Monitor buttons on the top right corner to manage the instances of a pipeline.
- State of the each instance. For details, see the table describing states of an instance.
- Health of the Pipeline: The health status of the pipeline is shown under the name of the pipeline in the left pane. Status values are: Good, Started, No new batches, Needs attention, Stopped, and At risk.
- Graphical representation of the pipeline.
- Panes: Events, Instances & Logs, and Summary.
- Events: Displays the live time series view. It shows information such as the number of records that are processed at a particular timeline.
- Instances & Logs: Displays the command id for each instance, the corresponding state, start and end time, and link to the logs.
- Summary: Displays information related to the code and the properties that were set.
Actions to Manage Pipelines from the Pipelines View¶
From the Pipelines view, you can perform certain actions on the pipelines from the … menu of the pipeline in the left navigation pane and from the top right corner.
You can perform the following actions from the … menu of the pipeline in the left navigation pane:
Edit: When a pipeline is in a draft or running state, you can edit all the details of the pipeline including the name of the pipeline.
If you edit a running pipeline that was creating in assisted mode, the running pipeline is opened for edit in the BYOC mode. You can discard all the edits or changes made after the pipeline is started by using the Discard option from the … menu on the top right corner. After editing a running pipeline, you must re-deploy the pipeline for the changes to take effect.
Clone: When you clone a pipeline, clone of the pipeline is created in draft state with the name in the following format:
nis the siblings count of the parent piepline, and a new pipeline id. You must use a new checkpoint location in the cloned pipeline. You can rename the cloned pipeline.
Archive: When you archive a pipeline, the pipeline is in cold state and cannot be run. You can clone the archived pipelines. To view the archived pipelines, you should use the filter in the left pane with the pipeline status as Archive from the drop-down list.
Delete: You can delete only the pipelines that are in the Draft state. To delete a pipeline, you must be either an admin or a user with the delete permission for the Pipelines resource. With the delete permission, you can delete only the pipelines that you have created. When you delete a pipeline, the pipeline details along with the metadata is deleted.
You can perform the following actions from the top right corner:
- Edit: When a pipeline is in a draft state, you can edit all the details of the pipeline including the name of the pipeline.
- Pause: When you pause a running pipeline, the pipeline is paused at a checkpoint and an alert is sent.
- Start: When you restart a paused pipeline, a new instance of the pipeline is started from the last checkpoint with a new command id.
- Monitor: You can monitor the status of the pipeline from the Grafana dashboard.
States of Pipelines¶
A pipeline can be in one of the following states:
Draft : When the pipeline is being created or edited.
Active: When the pipeline is started or running.
An active pipeline can have instances in the following states:
- Error: When the pipeline is cancelled due to an error.
- Paused: When the pipeline is paused.
- Waiting: When the command is waiting (For example, during cluster start). The Waiting state is displayed in the Instances & Logs tab and the state of the pipeline is shown as Active in the Pipelines view.
- Stopping: When the pause action is initiated, and pipeline has not yet stopped. It is an intermediate state between the Active and Paused status. The Stopping state is displayed in the Instances & Logs tab and the state of the pipeline is shown as Active in the Pipelines view.
Archive: When the pipeline is archived.
Delete: When the pipeline is deleted.
By default, the Pipelines view displays the pipelines in the Draft and Active state. To view the pipelines in the Archive state, you can use the Filter icon in the left pane with the pipeline status as Archive from the drop-down list. The deleted pipelines cannot be viewed.
The following table lists the various states of the pipelines and pipeline instances, and the relevant tasks that you can perform.
|If the state of a pipeline is…||Icon||Then you can perform the following tasks|
If a pipeline is not built completely, then it is in the draft mode.
|Active (instance of the pipeline is in the Running state)||
|Active (instance of the pipeline is in the Paused state)||
Active (instance of the pipeline is in the Error state)
A pipeline is in the errored state, if it stops running due to an error.