Hive-JDBC Connector

QDS provides a Hive connector for JDBC, so you can run SQL queries to analyze data that resides in JDBC tables.

Note

Qubole has deprecated its JDBC Storage handler. Use the open-source JDBC Storage handler. THE ADD JAR statement is mandatory only for Hive versions 1.2. and 2.1.1 as Hive versions 2.3 and 3.1.1 (beta) contain the required jars.

Adding Required Jars

AWS Example:

add jar s3://paid-qubole/jars/jdbchandler/oss-hive-JDBC-2.3.4.jar;

Azure Blob Example:

Note

Qubole plans to deprecate its Qubole JDBC Storage Handler shortly.

add jar wasb://default-datasets@paidqubole.blob.core.windows.net/jars/jdbchandler/mysql-connector-java-5.1.34-bin.jar;
add jar wasb://default-datasets@paidqubole.blob.core.windows.net/paid-qubole/jars/jdbchandler/qubole-hive-JDBC-0.0.7.jar;

Creating a Table

An external Hive table connecting a JDBC table can be created as follows, allowing read and write to an underlying JDBC table.

Example

The table can be created in two ways:

  • You can explicitly give column mappings along with the table creation statement.

    CREATE EXTERNAL TABLE student_jdbc
    (
    name string,
    age int,
    gpa double
    )
    STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
    TBLPROPERTIES (
    "hive.sql.database.type" = "MYSQL",
    "hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",
    "hive.sql.jdbc.url" = "jdbc:mysql://localhost/sample",
    "hive.sql.dbcp.username" = "hive",
    "hive.sql.dbcp.password" = "hive",
    "hive.sql.table" = "STUDENT",
    "hive.sql.dbcp.maxActive" = "1"
    );
    
  • You can specify no table mappings; the SerDe class automatically generates the mappings.

    CREATE EXTERNAL TABLE HiveTable
    row format serde 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcSerDe'
    STORED BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler'
    TBLPROPERTIES (
      "mapred.jdbc.driver.class"="com.mysql.jdbc.Driver",
      "mapred.jdbc.url"="jdbc:mysql://localhost:3306/rstore",
      "mapred.jdbc.username"="root",
      "mapred.jdbc.input.table.name"="JDBCTable",
      "mapred.jdbc.output.table.name" = "JDBCTable",
      "mapred.jdbc.password"=""
    );
    

Usage

The Hive-JDBC connector supports almost all types of SQL queries. Some examples of supported queries are:

Reading Data

> select * from HiveTable;
> select count(*) from HiveTable;
> select id from HiveTable where id > 50000;
> select names from HiveTable;
> select * from HiveTable where names like ‘D%’;
> select * FROM HiveTable ORDER BY id DESC;

Joining Tables

> select HiveTable_1.*, HiveTable_2.* from HiveTable_1 a join HiveTable_2 b
   on (a.id = b.id) where a.id > 90000 and b.id > 97000;

Writing Data

Note

Writing data holds good to Qubole-on-Azure/OCI/GCP until Qubole Hive JDBC Storage Handler is deprecated.

> Insert Into Table HiveTable_1 select * from HiveTable_2;
> Insert Into Table HiveTable_1 select * from HiveTable_2 where id > 50;

Group By Queries

> select id, sum(id_double) as sum_double from HiveTable group by id;

Support for PredicatePushDown

To enable/disable PredicatePushDown, add the following configuration.

set hive.optimize.ppd = true/false

Handling Unsuccessful Tez Queries While Querying JDBC Tables

Note

This holds good to Qubole-on-Azure/OCI/GCP until Qubole Hive JDBC Storage Handler is deprecated.

In the Hive JDBC connector, the JDBC Storage handler does not work when Input Splits Grouping is enabled in Hive-on-Tez.

As a result, the following exception message is displayed.

java.io.IOException: InputFormatWrapper can not support RecordReaders that don't return same key & value objects.

HiveInputFormat is enabled by default in Tez to support Splits Grouping.

You can avoid the issue by setting the input format as CombineHiveInputFormat using this command that disables the Splits Grouping.

set hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

Column/Type Mapping

As Qubole Hive uses open-source JDBC Storage Handler, refer to the OSS Column/Type Mapping for more information.

Partitioning

As Qubole Hive uses open-source JDBC Storage Handler, refer to the OSS Partitioning for more information.

Computation Pushdown

As Qubole Hive uses open-source JDBC Storage Handler, refer to the OSS Column/Type Mapping for more information.