Using Different Types of Notebook

You can use a notebook only when its associated cluster is up. A sample notebook with its cluster up is as shown in the following figure.

../../../_images/EditableNotebook1.png

Click the notebook in the left panel to view its details. The notebook’s ID, its type, and associated cluster are displayed.

Note

A pin is available at the bottom-left of the Notebooks UI. You can use it to hide/unhide the left side bar to toggle the notebooks’ list.

The following figure shows a notebook with its details displayed.

../../../_images/NotebookDetails.png

Using Folders in Notebooks explains how to use folders in the notebook.

You can run paragraphs in a notebook. After running paragraphs, you can export the results that are in a table format to a CSV (comma-separated value), TSV (tab-separated value), or raw format. Use these options by clicking the gear icon available in each paragraph (at the top-right corner). To download results:

  • Click Download Results as CSV to get paragraph results in a CSV format.
  • Click Download Results as TSV to get paragraph results in a TSV format.
  • Click Download Raw Results to get paragraph results in a raw format.

Qubole provides code auto completion in the paragraphs and an ability to stream outputs/query results. Notebooks also provide improved dependencies management. Currently, the supported types of notebooks are described in:

Using a Spark Notebook

Select a Spark notebook from the list of notebooks and ensure that its assigned cluster is up to use it for running queries. See Running Spark Applications in Notebooks and Understanding Spark Notebooks and its Interpreters for more information.

See Using the Angular Interpreter for more information.

When you run paragraphs in a notebook, you can watch the progress of the job or jobs generated by each paragraph. To do this, click on a job’s URL, Spark Application UI in the QDS Notebooks UI; this takes you to the Spark job UI. If you do this after the job has run, you will see the status of the job.

You can see the Spark Application UI in a specific notebook paragraph even when the associated cluster is down.

Using a Presto Notebook (AWS and Azure)

Select a Presto notebook from the list of notebooks and ensure that its assigned cluster is up to use it for running queries. The process for running a paragraph is the same as for a Spark paragraph.

Using a Hive Notebook (AWS)

Select a Hive notebook from the list of notebooks and ensure that its assigned cluster is up to use if for running queries. The process for running a paragraph is the same as for a Spark paragraph.

Note

As a prerequisite, HiveServer2 (HS2) must be enabled on the Hadoop2 (Hive) cluster for adding a Hive notebook. For more information on enabling HS2, see Configuring a HiveServer2 Cluster (AWS). Hive notebooks are in the beta phase.

Configuring a Hive Notebook

Warning

Hive notebooks are in the beta phase. As there may be potential security concerns to use it in production, you can experiment a Hive notebook and cannot use it for a production usage.

Before using a Hive notebook, you must add dependencies and add overrides in the HS2 cluster.

  1. Add these dependencies in the Hive notebooks’ Interpreters page.

    org.apache.hive:hive-jdbc:0.14.0
    org.apache.hadoop:hadoop-common:2.6.0
    
  2. Add Hive session timeout settings (in milliseconds) as Hive overrides for in the HS2 cluster UI > Advanced Settings > HIVE SETTINGS. For more information, see Configuring a HiveServer2 Cluster (AWS). Here is an example of Hive session timeout configuration overrides.

    hive.server2.session.check.interval = 60000
    hive.server2.idle.operation.timeout = 1800000
    hive.server2.idle.session.timeout = 2400000
    
  3. You can also increase number of parallel connections to HS2 based on the maximum number of paragraphs that you want to run on a Hive notebook. Add hive.server2.thrift.max.worker.threads as an Hive override in HS2 Cluster UI > Advanced Configuration > HIVE SETTINGS. Qubole recommends setting the value of hive.server2.thrift.max.worker.threads to 100 to begin with.

Running Hive Paragraphs

After adding the additional configurations, you can run paragraphs as shown in this example.

%jdbc (hive)
show tables

Using a Deep Learning Notebook (AWS)

Using Deep Learning Notebooks describes how to use a Deep Learning Notebook.

Note

Deep Learning clusters and notebooks are available for a beta access. Create a ticket with Qubole Support to enable this feature on a QDS account.

Uploading and Downloading a File to/from an S3 Location

Using the S3 tab in the Notebooks UI, you can upload a downloaded file (in CSV or other format) to an S3 location.

Note

You must have object Storage resource permission for the role assigned to you for uploading and downloading data. For more information, see :ref:manage-roles-user-resources-actions`.

To upload a file to an S3 location, follow these steps:

  1. Go to the S3 tab and select the S3 location that you want to use it as destination. Click the gear icon against the S3 bucket/folder as illustrated here.

    ../../../_images/UploadButtoninS3.png
  2. The Upload to S3 dialog appears as shown here.

    ../../../_images/UploadtoS31.png

    Click Choose File to browse to the location of the file and select it.

  3. Click Upload for uploading the file to the S3 location. Click Cancel if you do not want to upload.

Similarly, you can download a file from an S3 location. Select the file that you want to download from the S3 location and click the gear icon against the file. You can see Download as shown here.

../../../_images/DownloadfromS3.png