Managing Hive Bootstrap

The Hive bootstrap script is run before each query is submitted to QDS. Use it for set-up needed by every Hive query run in the account, for example:

  • Adding jars. For more information, see Adding Custom Jars in Hive.
  • Defining temporary functions
  • Setting Hive parameters
  • MapReduce settings for Hive queries

For example, to use test.py in all sessions, add a bootstrap command similar to this (for AWS):

add file s3n://prod.qubole.com/ec2-user_hu_6/scripts/test.py;

or (for Azure Blob):

add file wasb://<container_name>@<storage_account_name>.blob.core.windows.net/scripts/test.py;

or (for Oracle OCI):

add file oci://<bucket>@<tenancy_name>/defloc/scripts/hadoop/test.py;

Hive bootstrap settings can be defined in two ways:

  • User Bootstrap Settings: As a user of the account, if you want to override the account-level bootstrap settings, enable this option. Enabling this option fetches the bootstrap from your default location. You can override the bootstrap settings for a specific account by using the Bootstrap editor.

    Bootstrap Editor allows you manually write and edit entries. Settings in the bootstrap editor override the settings in the bootstrap file.

  • Account Bootstrap Settings: Setting account-level bootstrap settings enables all users of that account to use the same Hive bootstrap. The account-level settings can also be set to use a default and custom bootstrap location as described here:

    • Use Default Bootstrap Location: The default cloud location would be in DEFLOC/scripts/hive that contains the bootstrap file. If you modify the bootstrap file in Cloud storage, the change affects all users that use this file.

      Bootstrap Editor allows you manually write and edit entries. Settings in the bootstrap override editor override the settings in the default bootstrap file for the particular account you are logged in to.

    • Use Custom Bootstrap Location: A cloud location other than the default that contains the Hive bootstrap. This custom bootstrap location is useful when you want to use the same bootstrap in multiple accounts.

The user-level Hive bootstrap is loaded after the account-level Hive bootstrap. In case of duplicate entries in the user-level and account-level bootstraps, only the user-level Hive bootstrap becomes valid.

See Hive Bootstrap for more information. Set and View a Hive Bootstrap in a QDS Account describes the APIs to set and view a Hive bootstrap.

Using the Hive Bootstrap Tab on Control Panel

To configure a Hive bootstrap script, use Hive BootStrap in the QDS Control Panel.

Clicking Hive Bootstrap displays:

A sample default view of the Hive Bootstrap tab is as shown here.

../../../_images/HiveBootStrap.png

Configuring Account Bootstrap Settings

You can configure the Hive bootstrap using default and custom bootstrap location. By default, Use Default Bootstrap Location is selected.

Using Default Bootstrap Location

Upload the bootstrap file to the Cloud location if you have not already done so. The default location for a Hive bootstrap is <default location configured in your account>/scripts/hive.

By default, the BootStrap Editor area is blank; use it to create a bootstrap for the current account. To do this, click BootStrap Editor, enter bootstrap scripts, and click Save. The following figure shows an example of overriding a bootstrap configuration.

../../../_images/HiveBootStrapSave.png

Click Save after adding a new bootstrap script.

Using Custom Default Location

Choose Use Custom Bootstrap Location if you want to use bootstraps from a non-default location and the same bootstrap for multiple accounts. Enter the path of the bootstrap location. A sample of non-default location for hive bootstrap is illustrated here.

../../../_images/HiveBootstrapCustomLocation.png

Click the model button icon fileicon that is next to the Base Bootstrap Location text box to see the contents of a bootstrap file.

Click Save. Click Cancel to retain the previous bootstrap.

Configuring User Bootstrap Settings

In User Bootstrap Settings, QDS supports a user to override the account-wide bootstrap. By default, the user-hive bootstrap location for the current account is in <S3 default location configured in your account>/scripts/hive/<accountID>/<unique ID for the user>/bootstrap.

By default, the BootStrap Editor area is blank; use it to create a bootstrap to override the contents of the bootstrap file that is only specific to you. To do this, click BootStrap Editor, enter bootstrap scripts, and click Save.

The following figure shows an example of overriding a user-hive-bootstrap.

../../../_images/HiveUserBootstrap.png

Set and View a Hive Bootstrap in a QDS Account describes the APIs.

Analyzing Hive Bootstrap Failures through Analyze/Workbench UI Logs

On QDS, the order of query execution is:

  1. Account-level Hive bootstrap
  2. User-level Hive bootstrap
  3. Hive query

A Hive query with a bootstrap error, fails when it runs on QDS servers or the coordinator node. But when a Hive query with a bootstrap error runs on HiveServer 2, it gets executed as the bootstrap error is ignored. This results in issues as bootstraps may contain JARs/configuration that the associated Hive query is dependent on. To resolve this issue, Qubole has added an enhancement that fails a Hive query when its bootstrap has an error irrespective of which mode the query is running (including HiveServer2). The enhancement also indicates the exact cause for the query failure.

This enhancement to trace Hive bootstrap error logs is not enabled by default. You can enable it on Account Features in Control Panel of the QDS UI. To know more on how to enable, see Managing Account Features.

Note

You can see query failures if there is an existing error in the Hive Bootstrap. You should first ensure that the current bootstrap is valid before turning on this feature enhancement to avoid any breakage in the workloads. To check if the bootstrap is valid, you can run the query in the bootstrap separately through the Analyze/Workbench UI page. You should also check the validity of Hiveserver2 global init file (.hiverc file), if it is specified.

Example Scenario

Let us consider that you want to add the following two dependent files to each Hive query through the Hive bootstrap:

  1. add file s3://bucket/xyz.jar
  2. add file s3://bucket/softy.py

Assume that xyz.jar is not present on its S3 location and the Hive query is dependent on softy.py. So, the bootstrap execution fails on the line 1 and as a result, the line 2 does not execute causing the Hive query to fail with error that says s3://bucket.softy.py does not exist, which is a misleading error.

So, in such cases, you can enable the enhancement to see Hive bootstrap errors for easily debugging and tracing the exact cause of the Hive query failure.