Using Different Types of Notebook

You can use a notebook only when its associated cluster is up. A sample notebook with its cluster up is as shown in the following figure.

../../../_images/EditableNotebook1.png

Click the notebook in the left panel to view its details. The notebook’s ID, its type, and associated cluster are displayed.

Note

A pin is available at the bottom-left of the Notebooks UI. You can use it to hide/unhide the left side bar to toggle the notebooks’ list.

The following figure shows a notebook with its details displayed.

../../../_images/NotebookDetails.png

Using Folders in Notebooks explains how to use folders in the notebook.

You can run paragraphs in a notebook. After running paragraphs, you can export the results that are in a table format to a CSV (comma-separated value), TSV (tab-separated value), or raw format. Use these options by clicking the gear icon available in each paragraph (at the top-right corner). To download results:

  • Click Download Results as CSV to get paragraph results in a CSV format.
  • Click Download Results as TSV to get paragraph results in a TSV format.
  • Click Download Raw Results to get paragraph results in a raw format.

Qubole provides code auto completion in the paragraphs and an ability to stream outputs/query results. Notebooks also provide improved dependencies management. Currently, the supported types of notebooks are described in:

Using a Spark Notebook

Select a Spark notebook from the list of notebooks and ensure that its assigned cluster is up to use it for running queries. See Running Spark Applications in Notebooks and Understanding Spark Notebooks and its Interpreters for more information.

See Using the Angular Interpreter for more information.

When you run paragraphs in a notebook, you can watch the progress of the job or jobs generated by each paragraph. The following figure shows a sample paragraph with progress of the jobs.

../../../_images/job-ui-para.png

For more details about the job, click on the info icon adjacent to the job status in the paragraph, the Spark Application UI is displayed as shown below.

../../../_images/spark-ui-para.png

You can see the Spark Application UI in a specific notebook paragraph even when the associated cluster is down.

Using a Presto Notebook (AWS and Azure)

Select a Presto notebook from the list of notebooks and ensure that its assigned cluster is up to use it for running queries. The process for running a paragraph is the same as for a Spark paragraph. You can also publish a dashboard for a Presto notebook. For more information, see Dashboards.

For more information on configuration, see Configuring a Presto Notebook.

In Presto notebooks, the source field is set as notebook_<notebook-name>_<notebook-id> and in the corresponding dashboards the source field is set as dashboard_<dashboard-name>_<dashboard-id>_<source-note-id>. A source field is directly searchable in the Presto UI. For example, in the Presto UI, you can search a notebook by its name or ID to quickly filter queries, which are run from that specific notebook.

Note

Presto notebooks do not support interactive dashboards with the native interpreter currently. Qubole plans to support interactive dashboards with Presto notebooks in the near future. If you want to use interactive dashboards with the older jdbc interpreter, create a ticket with Qubole Support.

Here is an illustration that shows a notebook’s source field (with a successful query) on the Presto UI.

../../../_images/PrestoUINoteID.png

Using a Hive Notebook (AWS)

Select a Hive notebook from the list of notebooks and ensure that its assigned cluster is up to use if for running queries. The process for running a paragraph is the same as for a Spark paragraph.

Note

As a prerequisite, HiveServer2 (HS2) must be enabled on the Hadoop2 (Hive) cluster for adding a Hive notebook. For more information on enabling HS2, see Configuring a HiveServer2 Cluster. Hive notebooks are in the beta phase.

Qubole plans to deprecate Hive notebooks in the near future.

Configuring a Hive Notebook

Warning

Hive notebooks are in the beta phase. As there may be potential security concerns to use it in production, you can experiment a Hive notebook and cannot use it for a production usage.

Before using a Hive notebook, you must add dependencies and add overrides in the HS2 cluster.

  1. Add these dependencies in the Hive notebooks’ Interpreters page.

    org.apache.hive:hive-jdbc:0.14.0
    org.apache.hadoop:hadoop-common:2.6.0
    
  2. Add Hive session timeout settings (in milliseconds) as Hive overrides for in the HS2 cluster UI > Advanced Settings > HIVE SETTINGS. For more information, see Configuring a HiveServer2 Cluster. Here is an example of Hive session timeout configuration overrides.

    hive.server2.session.check.interval = 60000
    hive.server2.idle.operation.timeout = 1800000
    hive.server2.idle.session.timeout = 2400000
    
  3. You can also increase number of parallel connections to HS2 based on the maximum number of paragraphs that you want to run on a Hive notebook. Add hive.server2.thrift.max.worker.threads as an Hive override in HS2 Cluster UI > Advanced Configuration > HIVE SETTINGS. Qubole recommends setting the value of hive.server2.thrift.max.worker.threads to 100 to begin with.

Running Hive Paragraphs

After adding the additional configurations, you can run paragraphs as shown in this example.

%jdbc (hive)
show tables

Uploading and Downloading a File to or from a Cloud Location

From the Notebooks page of the QDS UI you can use the filesystem tab (S3, Blob, etc.,) to upload a downloaded file (in CSV or other format) to a Cloud location.

Note

The QDS role assigned to you must have upload and download permissions for the Object Storage resource; see Resources, Actions, and What they Mean.

To upload a file to a Cloud location, follow these steps:

  1. Choose the filesystem tab (S3, Blob, etc.,) from the leftmost pane of the Notebooks page and select the Cloud location that you want to upload the file to. Click the gear icon next to that location and choose Upload, as illustrated in this S3 example.

    ../../../_images/UploadButtoninS3.png
  2. The Upload a file dialog appears:

    ../../../_images/UploadtoS31.png

    Click Choose File, browse to the location of the file, and select it.

  3. Click Upload to upload the file to the Cloud location.

Similarly, you can download a file from a Cloud location. Select the file that you want to download, click the gear icon next to it, and choose Download, as illustrated in this S3 example.

../../../_images/DownloadfromS3.png