Composing a Hive Query through the UI¶
Use the command composer on the Analyze page to compose a Hive query. See Hive in Qubole for more information.
Hadoop 2 clusters support Hive queries. See Mapping of Cluster and Command Types for more information.
You can configure the Pig version on an Hadoop 2 (Hive) cluster. Pig 0.11 is the default version. Pig 0.15 and Pig 0.17 (beta) are the other supported versions. You can also choose between MapReduce and Tez as the execution engine when you set the Pig 0.17 (beta) version.
Run only Hive DDL commands on a Presto cluster. Running Hive DML commands on a Presto cluster is not supported.
Perform the following steps to compose a Hive query:
Using the Supported Keyboard Shortcuts in Analyze describes the supported keyboard shortcuts.
Navigate to the Analyze page and click Compose. Select Hive Query from the Command Type drop-down list.
Query Statement is selected by default from the drop-down list. Continue with step 3b if you choose this option
- To use a stored query, select Query Path, then specify the Cloud storage path that contains the Hive query file. Now skip to step 4.
- Enter the Hive query in the text field. Press Ctrl + Space in the command editor to get a list of suggestions. The following figure shows a sample query.
Select a Run Type. The default is Normal. The other options are Test Run and Constrained Run
The primary use of Test Run and Constrained Run options is during query authoring. Query authoring is often an iterative and error-prone process. When the input datasets are large, it may take minutes or hours for a query to complete and therefore, it takes a long time for you to figure out bugs in expressions and function calls. These options help you run the same query against a small subset of data to get to see some results quickly and this helps to fine-tune Hive queries.
Test Run can be used when you want to read a maximum size (in MB) of data by limiting the dataset size. The default data size to be limited is 100 MB.
Constrained Run can be used when, irrespective of the data, you want the query to run for a time range and this is done by specifying a minimum time (in minutes) and maximum time (in minutes). The default time range is 5 to 10 minutes.
Use the tooltip to know more information on each field.
- Click Run to execute the query. Click Save if you want to run the same query later. (See Workspace for more information on saving queries.)
You can see the result under the Results tab, and the logs under the Logs tab. The Logs tab has the Errors and Warnings filter. For more information on how to download command results and logs and logs, see Download Results and Logs from the Analyze UI.
The metastore consistency check (MSCK) result is displayed in the Logs tab of the Analyze page when
the configuration parameter
hive.qubole.write.msck.result.to.log is enabled at the query or cluster level,
or in a Hive bootstrap. A cluster restart is required for the cluster-level setting.
Viewing a Detailed Hive Log¶
A detailed log for a specific Hive query that is executed using HiveServer2 or Hive-on-master is uploaded to a subdirectory in the default location on the cloud object storage within a couple of minutes of query completion. The location of the logs is visible in the Logs tab of the Analyze UI page. The detailed log is not available by default. Create a ticket with Qubole Support to see the detailed log. Here is an example of the detailed log location in the Logs tab.
Viewing the Multi-line Column Data in the Query Result¶
On the UI, the query results that had columns with multiple line data did not display correctly. To overcome this,
Qubole supports newline (
\n) and carriage return (
\r) characters in Hive query results by escaping them in the Hive
result set and then un-escaping in the UI. You can get this feature enabled by creating a ticket with Qubole Support.
After this feature is enabled, even a simple SELECT query requires a cluster start.