Using Different Types of Notebook¶
You can use a notebook only when its associated cluster is up. A sample notebook with its cluster up is as shown in the following figure.
Click the notebook in the left panel to view its details. The notebook’s ID, its type, and associated cluster are displayed.
A pin is available at the bottom-left of the Notebooks UI. You can use it to hide/unhide the left side bar to toggle the notebooks’ list.
The following figure shows a notebook with its details displayed.
Using Folders in Notebooks explains how to use folders in the notebook.
You can run paragraphs in a notebook. After running paragraphs, you can export the results that are in a table format to a CSV (comma-separated value), TSV (tab-separated value), or raw format. Use these options by clicking the gear icon available in each paragraph (at the top-right corner). To download results:
- Click Download Results as CSV to get paragraph results in a CSV format.
- Click Download Results as TSV to get paragraph results in a TSV format.
- Click Download Raw Results to get paragraph results in a raw format.
Qubole provides code auto completion in the paragraphs and an ability to stream outputs/query results. Notebooks also provide improved dependencies management. Currently, the supported types of notebooks are described in:
Using a Spark Notebook¶
Select a Spark notebook from the list of notebooks and ensure that its assigned cluster is up to use it for running queries. See Running Spark Applications in Notebooks and Understanding Spark Notebooks and its Interpreters for more information.
See Using the Angular Interpreter for more information.
When you run paragraphs in a notebook, you can watch the progress of the job or jobs generated by each paragraph. The following figure shows a sample paragraph with progress of the jobs.
For more details about the job, click on the info icon adjacent to the job status in the paragraph, the Spark Application UI is displayed as shown below.
You can see the Spark Application UI in a specific notebook paragraph even when the associated cluster is down.
Using a Presto Notebook (AWS and Azure)¶
Select a Presto notebook from the list of notebooks and ensure that its assigned cluster is up to use it for running queries. The process for running a paragraph is the same as for a Spark paragraph. You can also publish a dashboard for a Presto notebook. For more information, see Dashboards.
For more information on configuration, see Configuring a Presto Notebook.
In Presto notebooks, the source field is set as
notebook_<notebook-name>_<notebook-id> and in the corresponding
dashboards the source field is set as
dashboard_<dashboard-name>_<dashboard-id>_<source-note-id>. A source
field is directly searchable in the Presto UI. For example, in the Presto UI, you can search a notebook by its
name or ID to quickly filter queries, which are run from that specific notebook.
Presto notebooks do not support interactive dashboards with the native interpreter currently. Qubole plans to support interactive dashboards with Presto notebooks in the near future. If you want to use interactive dashboards with the older jdbc interpreter, create a ticket with Qubole Support.
Here is an illustration that shows a notebook’s source field (with a successful query) on the Presto UI.
Using a Hive Notebook (AWS)¶
Select a Hive notebook from the list of notebooks and ensure that its assigned cluster is up to use if for running queries. The process for running a paragraph is the same as for a Spark paragraph.
As a prerequisite, HiveServer2 (HS2) must be enabled on the Hadoop2 (Hive) cluster for adding a Hive notebook. For more information on enabling HS2, see Configuring a HiveServer2 Cluster. Hive notebooks are in the beta phase.
Qubole plans to deprecate Hive notebooks in the near future.
Configuring a Hive Notebook¶
Hive notebooks are in the beta phase. As there may be potential security concerns to use it in production, you can experiment a Hive notebook and cannot use it for a production usage.
Before using a Hive notebook, you must add dependencies and add overrides in the HS2 cluster.
Add these dependencies in the Hive notebooks’ Interpreters page.
Add Hive session timeout settings (in milliseconds) as Hive overrides for in the HS2 cluster UI > Advanced Settings > HIVE SETTINGS. For more information, see Configuring a HiveServer2 Cluster. Here is an example of Hive session timeout configuration overrides.
hive.server2.session.check.interval = 60000 hive.server2.idle.operation.timeout = 1800000 hive.server2.idle.session.timeout = 2400000
You can also increase number of parallel connections to HS2 based on the maximum number of paragraphs that you want to run on a Hive notebook. Add
hive.server2.thrift.max.worker.threadsas an Hive override in HS2 Cluster UI > Advanced Configuration > HIVE SETTINGS. Qubole recommends setting the value of
100to begin with.
Running Hive Paragraphs¶
After adding the additional configurations, you can run paragraphs as shown in this example.
%jdbc (hive) show tables
Uploading and Downloading a File to or from a Cloud Location¶
From the Notebooks page of the QDS UI you can use the filesystem tab (S3, Blob, etc.,) to upload a downloaded file (in CSV or other format) to a Cloud location.
The QDS role assigned to you must have upload and download permissions for the Object Storage resource; see Resources, Actions, and What they Mean.
To upload a file to a Cloud location, follow these steps:
Choose the filesystem tab (S3, Blob, etc.,) from the leftmost pane of the Notebooks page and select the Cloud location that you want to upload the file to. Click the gear icon next to that location and choose Upload, as illustrated in this S3 example.
The Upload a file dialog appears:
Click Choose File, browse to the location of the file, and select it.
Click Upload to upload the file to the Cloud location.
Similarly, you can download a file from a Cloud location. Select the file that you want to download, click the gear icon next to it, and choose Download, as illustrated in this S3 example.