Submit a DB Export Command

POST /api/v1.2/commands/

Export commands allows you to push data from QDS to a relational database. You can either push a particular partition of a table (for partitioned table) or a complete unpartitioned table using mode 1. You can use the hive\_table and partition\_spec options to push data.

You can also use the data export command to export to an HDFS directory or a S3 location using mode 2. You can use the export\_dir option for this. In this case, you need to specify a separator used in the data. You can specify a separator using the fields\_terminated\_by option.

On the database side, by default, the data being exported is appended to the table. However, if you want to update existing data or update existing data with insert for non-existing rows, then you must specify this using the db\_update\_mode option and give us a comma-separated list of columns using db\_update\_keys. This is used to determine uniqueness of a row. It uses upsert functionality provided by the underlying database and is supported only for MySQL and Oracle.

Required Role

The following users can make this API call:

  • Users who belong to the system-user or system-admin group.

  • Users who belong to a group associated with a role that allows submitting a command. See Managing Groups and Managing Roles for more information.

Exporting Data from Hive

Note

Parameters marked in bold below are mandatory. Others are optional and have default values.

Parameter

Description

command_type

The DbExport command.

mode

1

hive_table

The name of the Hive Table from which data will be exported.

dbtap_id

The data store ID of the target database, in Qubole.

db_table

The target database table to export to.

additional_options

You can use this parameter to add any additional parameters to the command that you want to execute. For example, if you want to add columns to the command, then add --columns id,data through additional_options.

retry

Denotes the number of retries for a job. Valid values of retry are 1, 2, and 3.

retry_delay

Denotes the time interval between the retries when a job fails. The unit of measurement is minutes.

partition_spec

The partition specification for Hive table.

db_update_mode

This can be allowinsert or updateonly. If updateonly is specified, only the existing rows are updated. If allowinsert is specified, then existing rows are updated and non existing rows are inserted. If this option is not specified, then the given the data will be appended to the table.

db_update_keys

The columns used to determine the uniqueness of rows. It is valid only for db_update_mode.

name

Add a name to the command that is useful while filtering commands from the command history. It does not accept & (ampersand), < (lesser than), > (greater than), “ (double quotes), and ‘ (single quote) special characters, and HTML tags as well. It can contain a maximum of 255 characters.

tags

Add a tag to a command so that it is easily identifiable and searchable from the commands list in the Commands History. Add a tag as a filter value while searching commands. It can contain a maximum of 255 characters. A comma-separated list of tags can be associated with a single command. While adding a tag value, enclose it in square brackets. For example, {"tags":["<tag-value>"]}.

use_customer_cluster

To run the query on a specific Hadoop/Hive cluster, set this option to true.

customer_cluster_label

Denotes the label of the Hadoop/Hive cluster on which you want to run the command.

macros

Denotes the macros that are valid assignment statements containing the variables and its expression as: macros: [{"<variable>":<variable-expression>}, {..}]. You can add more than one variable. For more information, see Macros.

timeout

It is a timeout for command execution that you can set in seconds. Its default value is 129600 seconds (36 hours). QDS checks the timeout for a command every 60 seconds. If the timeout is set for 80 seconds, the command gets killed in the next minute that is after 120 seconds. By setting this parameter, you can avoid the command from running for 36 hours.

Example: Exporting an Unpartitioned Hive Table

curl  -i -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Content-Type: application/json" -H "Accept: application/json" \
-d '{"command_type": "DbExportCommand", "mode":"1", "hive_table":"default_qubole_airline_origin_destination",
"dbtap_id":"2", "db_table":"exported_airline_origin_destination","use_customer_cluster" : "true", "customer_cluster_label":"hadoop2"}' \
 "https://api.qubole.com/api/v1.2/commands"

Note

The above syntax uses https://api.qubole.com as the endpoint. Qubole provides other endpoints to access QDS that are described in Supported Qubole Endpoints on Different Cloud Providers.

Assuming a table exported_airline_origin_destination with following schema already exists:

CREATE TABLE `exported_airline_origin_destination` (`ItinID` varchar(1000) DEFAULT NULL, `MktID` varchar(1000) DEFAULT NULL, `SeqNum` varchar(1000) DEFAULT NULL, `Coupons` varchar(1000) DEFAULT NULL, `Year` varchar(1000) DEFAULT NULL, `Quarter` varchar(1000) DEFAULT NULL, `Origin` varchar(1000) DEFAULT NULL, `OriginAptInd` varchar(1000) DEFAULT NULL, `OriginCityNum` varchar(1000) DEFAULT NULL, `OriginCountry` varchar(1000) DEFAULT NULL, `OriginStateFips` varchar(1000) DEFAULT NULL, `OriginState` varchar(1000) DEFAULT NULL, `OriginStateName` varchar(1000) DEFAULT NULL, `OriginWac` varchar(1000) DEFAULT NULL, `Dest` varchar(1000) DEFAULT NULL, `DestAptInd` varchar(1000) DEFAULT NULL, `DestCityNum` varchar(1000) DEFAULT NULL, `DestCountry` varchar(1000) DEFAULT NULL, `DestStateFips` varchar(1000) DEFAULT NULL, `DestState` varchar(1000) DEFAULT NULL, `DestStateName` varchar(1000) DEFAULT NULL, `DestWac` varchar(1000) DEFAULT NULL, `Break` varchar(1000) DEFAULT NULL, `CouponType` varchar(1000) DEFAULT NULL, `TkCarrier` varchar(1000) DEFAULT NULL, `OpCarrier` varchar(1000) DEFAULT NULL, `RPCarrier` varchar(1000) DEFAULT NULL, `Passengers` varchar(1000) DEFAULT NULL, `FareClass` varchar(1000) DEFAULT NULL, `Distance` varchar(1000) DEFAULT NULL, `DistanceGroup` varchar(1000) DEFAULT NULL, `Gateway` varchar(1000) DEFAULT NULL, `ItinGeoType` varchar(1000) DEFAULT NULL, `CouponGeoType` varchar(1000) DEFAULT NULL)

Example: Exporting a Single Hive Partition

curl  -i -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Content-Type: application/json" -H "Accept: application/json" \
-d '{"command_type": "DbExportCommand", "mode":"1", "hive_table":"miniwikistats", "partition_spec":"dt=20110104-02",
"dbtap_id":"2", "db_table":"exported_minitest","use_customer_cluster" : "true", "customer_cluster_label":"hadoop2"}'
\ "https://api.qubole.com/api/v1.2/commands"

Note

The above syntax uses https://api.qubole.com as the endpoint. Qubole provides other endpoints to access QDS that are described in Supported Qubole Endpoints on Different Cloud Providers.

Assuming a table exported_minitest with following schema already exists:

CREATE TABLE `exported_minitest` (`projcode` varchar(100) DEFAULT NULL, `pagename` varchar(1000) DEFAULT NULL, `pageviews` int(11) DEFAULT NULL, `bytes` bigint(20) DEFAULT NULL, `dt` varchar(100) DEFAULT NULL)

Exporting Data from HDFS/S3

Note

Parameters marked in bold below are mandatory. Others are optional and have default values.

Parameter

Description

command_type

The DbExport command.

mode****

2

export_dir

The HDFS/S3 location from which data will be exported.

dbtap_id

The data store ID of the target database in Qubole.

db_table

The target database table to export to.

db_update_mode

This can be allowinsert or updateonly. If updateonly is specified, only the existing rows are updated. If allowinsert is specified, then existing rows are updated and non existing rows are inserted. If this option is not specified, then the given data will be appended to the table.

db_update_keys

The columns used to determine the uniqueness of rows. It is valid only for db_update_mode.

fields_terminated_by

The Hex of the char used as column separator in the dataset, for example: \0×20 for space.

additional_options

You can use this parameter to add any additional parameters to the command that you want to execute. For example, if you want to add columns to the command, then add --columns id,data through additional_options.

use_customer_cluster

To run the query on a specific Hadoop/Hive cluster, set this option to true.

customer_cluster_label

Denotes the label of the Hadoop/Hive cluster on which you want to run the command.

name

Add a name to the command that is useful while filtering commands from the command history. It does not accept & (ampersand), < (lesser than), > (greater than), “ (double quotes), and ‘ (single quote) special characters, and HTML tags as well. It can contain a maximum of 255 characters.

tags

Add a tag to a command so that it is easily identifiable and searchable from the commands list in the Commands History. Add a tag as a filter value while searching commands. It can contain a maximum of 255 characters. A comma-separated list of tags can be associated with a single command. While adding a tag value, enclose it in square brackets. For example, {"tags":["<tag-value>"]}.

macros

Denotes the macros that are valid assignment statements containing the variables and its expression as: macros: [{"<variable>":<variable-expression>}, {..}]. You can add more than one variable. For more information, see Macros.

timeout

It is a timeout for command execution that you can set in seconds. Its default value is 129600 seconds (36 hours). QDS checks the timeout for a command every 60 seconds. If the timeout is set for 80 seconds, the command gets killed in the next minute that is after 120 seconds. By setting this parameter, you can avoid the command from running for 36 hours.

Example: Exporting from an HDFS/S3 Location

curl  -i -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Content-Type: application/json" -H "Accept: application/json" \
-d '{"command_type": "DbExportCommand", "mode":"2", "export_dir":"s3n://paid-qubole/default-datasets/miniwikistats/20110104-02/",
"fields_terminated_by":"\\0x20", "dbtap_id":"2", "db_table":"exported_minitest2",,"use_customer_cluster" : "true",
"customer_cluster_label":"hadoop2"}' "https://api.qubole.com/api/v1.2/commands"

Note

The above syntax uses https://api.qubole.com as the endpoint. Qubole provides other endpoints to access QDS that are described in Supported Qubole Endpoints on Different Cloud Providers.

Assuming a table exported_minitest2 with following schema already exists:

CREATE TABLE `exported_minitest2` ( `projcode` varchar(100) DEFAULT NULL, `pagename` varchar(1000) DEFAULT NULL, `pageviews` int(11) DEFAULT NULL, `bytes` bigint(20) DEFAULT NULL)

Known Issue: Does not work if the database column name has special characters like a blank space and ‘ (quote).