Getting Started with Qubole on Oracle OPC Classic

Prerequisites:

  • You must have an Oracle OCI CLassic account.

To use QDS on Oracle OCI Classic, do the following:

See also:

Configuring PCI CLassic Resources

In the OCI Classic console:

  1. Create a new user. QDS will use this user to bring up and shut down OPC instances. This user must have the Compute Operations role, which allows it to launch instances.
  2. Create a container to store data generated by QDS.

Configuring QDS to use OCI Classic

In the QDS User Interface:

Creating a Qubole Account

  1. Go to https://oraclecloud-opc.qubole.com.

  2. Click Sign Up.

  3. Provide the information you are prompted for and click Next.

  4. Enter your email address and full name. Click CREATE MY FREE ACCOUNT. You will receive an email message at the email address you provided, with an activation code. You can confirm your account either by clicking on the link in the message, or by copying and pasting the activation code into the signup window.

    Alternatively, you can use your Google or SAML credentials to create a Qubole account.

  5. Click Save to save your changes.

After creating and confirming your account, you can log in and start configuring your QDS account; you’ll see the Analyze page initially.

Configuring Qubole Account Settings for OCI Classic

Proceed as follows to configure your QDS account.

  1. In the QDS UI, choose Control Panel from the drop-down list at the top left of the page, and then choose Account Settings.

  2. Fill in the fields in the Account Details section as follows:

    • Account Name: Provide a name for this account.
    • Domain Name Allowed to Sign In/Up: Enter a domain, or a comma-separated list of domains, from which this account can be used; for example, qubole.net or qubole.net,example.com.
    • Idle Session Timeout: Optionally specify how long (in minutes) QDS should wait to terminate an idle QDS UI session. The default is 1440 minutes (24 hours). To change it, enter a number from 1 to 10080 (10080 minutes is a week).
    • Idle Cluster Timeout: Optionally specify how long (in hours) QDS should wait to terminate an idle cluster. The default is two hours. See also step 4 under Updating the default QDS clusters below.
    • Allow Qubole Access: Check this to allow Qubole Support to log in to this account (helpful if you run into problems).
    • Email List for Account Updates: Enter a list of email addresses to which notifications will be sent about changes to this account or the cluster configuration.
    • Command Timeout: (Optional) Enter the number of seconds to wait before triggering an alert that a query you ran (from the Compose tab of the Analyze page) is still running.

    Click Save to save your changes.

  3. Fill in the fields in the Storage Settings section:

    • Identity Domain: The identity domain in which you configured your OCI Classic resources.

    • Username: The email address of the user you created in step 1 above.

    • Password: The password for your OCI Classic account.

      Note

      These credentials must provide write access to the default OCI Classic containers compute_images and compute_images_segments.

    • REST API endpoint: To see the REST API endpoint for your account in the OCI Classic UI, click on the drop-down menu in the Storage pane on the dashboard, and choose View Details.

      Note

      If the Storage pane is not showing, click on Customize Dashboard and set Storage to Show in the resulting dialog.

      The REST Endpoint will look something like this: https://<identity-domain>.storage.oraclecloud.com/v1/Storage-<identity-domain> where <identity-domain> should match the Identity Domain name you have just entered into QDS. Enter the first part of the string in the REST API Endpoint text box in the QDS UI: https://<identity-domain>.storage.oraclecloud.com.

    • Default location: The default container in OPC object storage where QDS will store any generated data.

    • Directory: (optional) A directory within the Default location if you want to use that directory as the default location.

      Note

      The container (and directory if any) must exist; see step 2 above.

    Click Save to save your changes.

  4. Fill in the fields in the Compute Settings section as follows:

    • Identity Domain: The identity domain in which you configured your OCI Classic resources.

    • Username: The email address of the user you created in step 1 above.

    • REST API endpoint: To see the REST API endpoints for your account in the OCI Classic UI, click on the drop-down menu in the Compute pane on the dashboard, and choose View Details.

      Note

      If the Compute pane is not showing, click on Customize Dashboard and set Compute to Show in the resulting dialog.

      You can choose either of two the compute sites (as identified by the the REST API endpoints) that Oracle provides for your account; QDS will launch clusters in that site by default. You can override the REST API endpoint on the QDS Clusters page if you want to launch a particular cluster in a different site. (See also Updating the default QDS clusters.)

    • Select Push Compute Settings to all clusters if you want to use the same settings for all QDS clusters.

    Click Save to save your changes.

Characteristics of the Shared Network (default)

When QDS launches a cluster for you, it uses a shared network by default, and creates a vNICset and ACL before launching the cluster.

The rules set for this shared network are as follows:

  • Allow incoming traffic (for SSH only) to the cluster from the NAT Gateway used by QDS.
  • Allow outgoing traffic (all protocols)from the cluster.

The same security rules are set for each node of the cluster, ensuring that all the nodes can talk to each other.

If you want your QDS cluster to use the shared network, you don’t need to do any additional network configuration, but if you want to use a private network instead, follow the instructions in the next section.

Configuring a Private IP Network for your QDS Clusters (optional)

If you want the QDS cluster to run in a private IP network instead of the shared network, you need to complete step 1, and optionally step 2, below, and identify the network and ACL to QDS when you configure the QDS cluster.

To configure a private network in OCI Classic, proceed as follows:

  1. Define an IP network for your OCI Classic private addresses.

  2. If you need to communicate with the QDS cluster from outside the cluster (for example from non-QDS instances, or from on-premise systems), create an ACL to allow your instances to communicate with instances in the QDS cluster. This ACL needs an ingress security rule with the source vNICset or IP Address Prefix Set set to the IP network that needs to communicate with QDS. Do not set the destination; this will be the QDS cluster’s vNICset, which doesn’t exist yet (it is created before the cluster is launched).

    See the Oracle documentation for information about setting IP address prefix sets and security protocols.

Updating the default QDS clusters

In addition to pushing the compute settings, you also need to update the default QDS clusters.

Navigate to the Clusters page in the QDS UI and do the following for each cluster you intend to use:

  1. Choose the edit (pencil) icon.
  2. Accept the default Master Node and Slave Node types, or choose different types from the drop-down lists.
  3. If you are using a node bootstrap file complete the pathname, or accept the default.
  4. Check Disable Automatic Cluster Termination only if you always want to terminate your Qubole clusters manually. Qubole recommends you leave this box unchecked, allowing QDS to shut down idle clusters.
  5. Click Next and proceed as follows in the Advanced Configuration tab.
  6. In the OPC SETTINGS section:
    • Check Same as Default Compute to use the compute settings you configured earlier. Otherwise, enter a different set of credentials (the Username, Password, and Compute REST API endpoint) to be used by this particular cluster.
    • If you are using a private IP network, choose an ACL (optional) and IP Network from the drop-down lists. Otherwise QDS will launch the cluster in a shared network.
  7. In the HADOOP CLUSTER SETTINGS section, you can modify:
    • Hadoop Configuration Variables: Enter Hadoop variables here if you want to override the defaults that Qubole uses.
    • Fair Scheduler Configuration: Enter Hadoop Fair Scheduler values if you want to override the defaults that Qubole uses.
    • Default Fair Scheduler Queue: Specify the default Fair Scheduler queue (used if no queue is specified when the job is submitted).
  8. In the MONITORING section, check the Enable Ganglia Monitoring box if you want to use Ganglia; see Performance Monitoring with Ganglia.
  9. In the Security Settings section, enter a Customer SSH Public Key if you want to log in to QDS cluster nodes. This is the public key from an SSH public-private key pair.

When you are satisfied with your changes, click Save.

About the Analyze Page

Take some time to familiarize yourself with the Analyze page. It has the following tabs:

  • The History tab shows previous commands; you can re-run them, with or without modification (use the Re-Run and Edit buttons).
  • The Workspace tab allows you to save, edit, and re-run queries and commands.
  • The Tables tab shows Hive tables, including Qubole demo tables. Click the arrow at the left to see the table’s columns and their type
  • The Object Store tab allows you to browse the OPC object storage you identified earlier.

Click the Create button at the top left to clear the fields in the right frame and compose a new command, query, job, or other task, and click the Run button to run it.

For more information, see About the Analyze User Interface.

About Job Logs

Job logs are written out under the Logs tab while a job is running, and are also saved for later access. To see saved logs, click on the History tab, and then in the left pane, click on the job you are interested in. Its results and logs are saved under their respective tabs.

The job log provides a link to the Application UI; click on the link to see detailed information about the job, including information about the Map and Reduce tasks.

Useful OPC Documentation

The following OPC documents provide help in performing the OPC configuration tasks on this page: