DynamoDB - Hive Connector (AWS)

AWS users can read from and write to a DynamoDB table using Qubole Hive.

Note

DynamonDB is currently supported only on AWS.

The following section explains how to connect a DynamoDB table to a Hive table.

Adding Required Jars

Add this jar on an Hadoop 1 cluster.

add jar s3://paid-qubole/dynamoDB/jars/qubole-hive-connectors-hadoop1-final-0.0.5.jar;

Add this jar on an Hadoop 2 cluster.

add jar s3://paid-qubole/dynamoDB/jars/qubole-hive-connectors-hadoop2-0.0.7.jar;

Setting Credentials

Set credentials as given here.

set mapreduce.dynamodb.access.key=FSDFDSFDS;
set mapreduce.dynamodb.secret.key=dfgdsjkhfdfhfdsbfdsk;

If dynamodb is present in an AWS Region other than us-east-1, then set an explicit endpoint for it. For example, if you are accessing a dynamodb which is in us-west-2, set this endpoint:

set mapreduce.dynamodb.endpoint=dynamodb.us-west-2.amazonaws.com;

Creating a Hive Table

Creating a table is the step, where, all the dynamoDB table properties are passed on, enabling hive to access the data. Creating a table requires dynamodb.table.name and dynamodb.column.mapping. These properties are passed on as follows:

Example

drop table dynamoDBTest;
CREATE EXTERNAL TABLE dynamoDBTest (
    string_eg string,
    number_eg bigint,
    binary_eg binary,
    strings_eg array<string>,
    numbers_eg array<bigint>,
    binarys_eg array<binary>)
STORED BY 'com.willetinc.hive.mapreduce.dynamodb.HiveDynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "dynamoDBTestTable_donot_modify",
               "dynamodb.column.mapping" = "string_eg:s_eg,number_eg:n_eg,
                                            binary_eg:b_eg,
                                            strings_eg:ss_eg,
                                            numbers_eg:ns_eg, binarys_eg:bs_eg"
               );

Query Data

Example

select * from dynamoDBTest;
select * from dynamoDBTest where 1==1 limit 10;

insert overwrite table dynamoDBTest select * from hive_table limit 2;

Throughput

To achieve the optimal performance, you can tweak the following parameters.

Parameter Description
dynamodb.throughput.read.percent Set the rate of read operations to keep the DynamoDB provisioned throughput rate in the allocated range for the table. The value is between 0.1 and 1.5, inclusively. The default value is set to 0.5
dynamodb.max.map.tasks Specify the maximum number of map tasks when reading data from DynamoDB. This value must be equal to or greater than 1.