Getting Started with Qubole on Oracle OCI

Prerequisites:

  • You must have an Oracle OCI account.

    Note

    For Qubole Data Services (QDS), you can use two separate accounts, one for compute and one for storage, if you prefer. In this case you will need to repeat some steps, as indicated below.

  • Your Oracle OCI service limits must be sufficient to allow QDS to bring up cluster nodes as needed.

To use QDS on Oracle OCI, do the following:

Configuring Oracle OCI Resources

In the Oracle OCI console:

  1. Create a new user. QDS will use this user to bring up and shut down Oracle OCI instances.

  2. Create a public-private key pair and upload the public key to the user you created in step 1. (Click on the user name in the OCI console to bring up the Add Public Key dialog.) Save the Fingerprint that Oracle provides when you upload the public key.

  3. Qubole recommends that you create a new group for the user you created in step 1.

  4. Add the user you created in step 1 to the group. (Click on the user name in the OCI console, then choose Groups to bring up the Add User to Group dialog).

  5. Create a new compartment. (You can use one of your existing compartments if you prefer). This is where all your QDS instances and images will be stored and brought up, and (by default) where query output and logs will be stored as well.

    Note

    If you want to use separate compartments for storage and compute, repeat steps 5 and 6 for the second compartment. If you want to associate each compartment with a different user, repeat steps 1 through 6 for the second compartment.

    Create at least one bucket in the storage compartment; this is for QDS output and logs. (See Step 3 below.)

  6. Add or edit the policy of the compartment to give the necessary permissions to the group you created in step 3. These must allow requesting and terminating instances. Qubole recommends you use a statement in the form:

    allow group <group_name> to manage all-resources in compartment <compartment_name>

    where group_name is the name of the group you created. See the Oracle documentation for more information.

    Note

    To set a more restrictive policy, see Configuring a More Restrictive Policy.

  7. Configure a Virtual Cloud Network (VCN) with the following characteristics. You can create one or use an existing one.

    Note

    When you create a VCN in the Oracle OCI UI and choose CREATE VIRTUAL CLOUD NETWORK PLUS RELATED RESOURCES, the resulting VCN will have most of these characteristics by default, but will not have the ingress rules for each subnet’s CIDR, as noted below.

    • Has an internet gateway.

    • Has a route table with a rule that specifies the internet gateway as the target of CIDR block 0.0.0.0/0 (allowing traffic between the VCN and internet).

    • Has subnets for each OCI Availability Domain in which you intend to launch QD clusters:

    • The security list for the subnets must have the following rules at a minimum:

      • Stateful ingress rules, specifying each subnet’s CIDR as the source CIDR, allowing all protocols (and hence all ports).

        Note

        These rules are not created by default when you choose CREATE VIRTUAL CLOUD NETWORK PLUS RELATED RESOURCES.

      • A stateful ingress rule, specifying 0.0.0.0/0 as the source CIDR, allowing ssh access (TCP protocol, port 22).

      • A stateful egress rule, specifying 0.0.0.0/0 as the destination CIDR, allowing all protocols (and hence all ports).

    This is the network in which QDS will bring up the instances for your clusters. See the Oracle documentation for more information.

Configuring a More Restrictive Policy

Step 5 above provides a sample of a broad policy for the compartment you have created for instances and (optionally) storage. If you decide you want to set more restrictive rules, Qubole recommends you create policies as follows:

  • For the instances and images:

    ALLOW GROUP <group_name> to manage volumes in compartment <compartment_name>
    ALLOW GROUP <group_name> to manage instances in compartment <compartment_name>
    ALLOW GROUP <group_name> to manage volume-attachments in compartment <compartment_name>
    ALLOW GROUP <group_name> to use virtual-network-family in compartment <compartment_name>
    
  • For the VCN and subnets (step 6 above)

    ALLOW GROUP <group_name> to use virtual-network-family in compartment <compartment_name>
    
  • For storage:

    ALLOW GROUP <group_name> to manage object-family in compartment <compartment_name>
    

See the Oracle documentation for more information.

Configuring QDS to use Oracle OCI

In the QDS User Interface:

Creating a Qubole Account

  1. Go to https://oraclecloud.qubole.com.

  2. Click Sign Up.

  3. Provide the information you are prompted for and click SIGN UP WITH EMAIL, Sign up with Google, or Sign up with SAML .

    • If you sign up with Google or SAML, you will be logged in immediately and you can start configuring your QDS account.

    • If you sign up with email, you’ll be prompted to provide your email address and answer a question to prove you are not a robot. Enter your email address, answer the question as prompted, and click Sign Up. You will receive an email message at the email address you provided, with an activation code. You can confirm your account either by clicking on the link in the message, or by copying and pasting the activation code into the signup window. After creating and confirming your account, you can log in and start configuring your QDS account; you’ll see the Analyze page initially.

Configuring Qubole Account Settings for Oracle OCI

Proceed as follows to configure your QDS account.

  1. In the QDS UI, choose Control Panel from the drop-down list at the top left of the page, and then choose Account Settings.

  2. Fill in the fields in the Account Details section as follows:

    Account Name: Provide a name for this account.

    Domain Name Allowed to Sign In/Up: Enter a domain, or a comma-separated list of domains, from which this account can be used; for example, qubole.net or qubole.net,example.com.

    Idle Session Timeout: Optionally specify how long (in minutes) QDS should wait to terminate an idle QDS UI session. The default is 1440 minutes (24 hours). To change it, enter a number from 1 to 10080 (10080 minutes is a week).

    Allow Qubole Access: Check this to allow Qubole Support to log in to this account (helpful if you run into problems).

    Email List for Account Updates: Enter a list of email addresses to which notifications will be sent about changes to this account or the cluster configuration.

    Command Timeout: (Optional) Enter the number of seconds to wait before triggering an alert that a query you ran (from the Compose tab of the Analyze page) is still running.

    Click Save to save your changes.

  3. Fill in the fields in the Storage Settings section:

    Tenant ID: the Tenancy OCID of the account in which you created the user in step 1 above. The Tenancy OCID appears at the bottom of the screen in the Oracle OCI Console.

    User ID: The ID of the user you created in step 1 above.

    Key finger print: the fingerprint Oracle provided when you uploaded the public key in step 2 above.

    API private RSA key: The private key from the key pair you created in step 2 above.

    Default location: The default location in Oracle OCI object storage where QDS will store any generated data. This is in the form:

    <bucket>@<namespace>/<path>

    Note

    • <bucket> must exist; see step 5 above. <namespace> is the same as the**Tenant Name** you provide when you log in to Oracle OCI.

    Click Save to save your changes.

  4. Fill in the fields in the Compute Settings section. If you are using the same compartment for storage and compute, enter the same values as you used in step 3; otherwise enter the values for a separate compute compartment.

Providing Settings for your Qubole Custom OS Image

Before you can launch QDS cluster nodes for the first time, Qubole must create a custom operating system image for you. This set-up process can take a few hours. After building the image, Qubole uses it as a template to configure every Oracle OCI instance QDS launches for you as a cluster node. Proceed as follows.

  1. Choose the following from the drop-down lists in the Image Creation section:

    • Availability Domain

    • Compartment ID

    • VCN

    • Subnet

    The compartment and VCN must meet the requirements described above.

  2. Click Save.

When the image has been built, you’ll see a message telling you that the image creation was successful; then you can update your clusters.

Note

Qubole deploys one image per compartment, using the Standard shape. In the process you incur charges for the compute time, for the image itself, and for the data ingest. Oracle charges these costs to you directly. These are startup costs: you incur them once when you activate your QDS account (as described on this page) and once for each subsequent QDS release deployed in your account.

Updating the default QDS clusters

In addition to pushing the compute settings, you also need to update the default QDS clusters.

Navigate to the Clusters page in the QDS UI and do the following for each cluster you intend to use:

  1. Choose the edit (pencil) icon.

  2. Choose the Region and Availability Domain from the drop-down lists.

  3. Complete the name of the path to the Node Bootstrap File if you are using one.

  4. Check Disable Automatic Cluster Termination only if you always want to terminate your Qubole clusters manually. Qubole recommends you leave this box unchecked, allowing QDS to shut down idle clusters.

  5. Click Next and proceed as follows in the Advanced Configuration tab.

  6. In the ORACLE SETTINGS section:

    • Check Same as Default Compute to use the compute settings you configured earlier. Otherwise, enter a different set of credentials to be used by this particular cluster.

    • From the drop-down lists choose a Compartment ID, VCN, and Subnet. These must meet the requirements described above.

  7. In the HADOOP CLUSTER SETTINGS section, you can modify:

    • Hadoop Configuration Variables: Enter Hadoop variables here if you want to override the defaults that Qubole uses.

    • Fair Scheduler Configuration: Enter Hadoop Fair Scheduler values if you want to override the defaults that Qubole uses.

    • Default Fair Scheduler Pool: Specify the default Fair Scheduler pool (used if no pool is specified when the job is submitted).

Note

In the Hadoop 2 implementation, pools are referred to as “queues”.

  1. In the MONITORING section, check the Enable Ganglia Monitoring box if you want to use Ganglia; see Performance Monitoring with Ganglia.

  2. In the SECURITY SETTINGS section:

    • Enter a Customer SSH Public Key if you want to log in to QDS cluster nodes. This is the public key from an SSH public-private key pair.

    • Check the box to Enable Encryption if you want QDS to encrypt data at rest in local storage. In this case, intermediate output generated by Hadoop, and HDFS itself, are encrypted on the underlying storage device. Block device encryption is setup for ephemeral drives before the node joins the cluster. As a side effect, the cluster could take longer to come up (depending on the instance type selected) before it becomes operational. Upscaling the cluster may also take longer.

When you are satisfied with your changes, click Save.

For more information, see Modifying Cluster Settings for Oracle OCI.

If a cluster fails to start, make sure you have completed all the steps on this page, then check Troubleshooting Oracle OCI Cluster Startup Failures.

About the Analyze Page

Take some time to familiarize yourself with the Analyze page. It has the following tabs:

  • The History tab shows previous commands; you can re-run them, with or without modification (use the Re-Run and Edit buttons).

  • The Workspace tab provides sample queries.

  • The Tables tab shows Hive tables, including Qubole demo tables. Click the arrow at the left to see the table’s columns and their type

  • The Object Store tab allows you to browse the Oracle object storage.

Click the Compose button to clear the fields in the right frame and compose a new command, query, job, or other task, and click the Run button to run it.

For more information, see About the Analyze User Interface.

About Job Logs

Job logs are written out under the Logs tab while a job is running, and are also saved for later access. To see saved logs, click on the History tab, and then in the left pane, click on the job you are interested in. Its results and logs are saved under their respective tabs.

The job log provides a link to the Application UI; click on the link to see detailed information about the job, including information about the Map and Reduce tasks.