Submit a DB Import Command

POST /api/v1.2/commands/

Import commands allows you to pull data from a relational database to QDS in a Hive table. You can either pull a complete table or a subset of it and this can be done using mode 1. In mode 1, to selectively pull the rows, you can specify the condition using the db\_where option. If only a subset of column must be pulled, then you can specify a comma-separated list in the db\_columns option.

To speed up the process of pulling data, you can set db\_parallelism in which case, Qubole opens multiple connections to the database to pull the data. In case of mode 1, Qubole automatically detects and uses primary key column of the table to split the work. Bear in mind that setting a very high value for db\_parallelism adds pressure on the database.

Sophisticated users can specify a custom query to transform the data before pulling it using mode 2. In case of mode 2, if db\_parallelism > 1, Qubole needs more input to split the work. So, you must specify:

db_split_column: the column used to split the work
db_boundary_query: the query used to get a range on the column typically it would be as the following example: select min(db_split_column), max(db_split_column) from db_table
db_extract_query: must have a WHERE clause with $CONDITIONS in it. At execution time, $CONDITIONS is replaced by clause like: db_split_column > lower_val AND db_split_column <= upper_val This is used to determine boundary for splitting work.

Required Role

The following users can make this API call:

Users who belong to the system-user or system-admin group.
Users who belong to a group associated with a role that allows submitting a command. See Managing Groups and Managing Roles for more information.

Simple Mode

Note

Parameters marked in bold below are mandatory. Others are optional and have default values.

Parameter	Description
command_type	DbImportCommand
schema	It is the source database to import into. THe default schema is `default`.
hive_table	Specify Hive table to import into.
mode	1
dbtap_id	DbTap ID of the source database in Qubole.
db_table	Table to import from.
additional_options	You can use this parameter to add any additional parameters to the command that you want to execute. For example, if you want to add column mapping to the command, then add `--map-column-hive id=int,data=string` through `additional_options`.
db_columns	Use this option when you want only a subset of all columns. It can contain a list of a comma-separated columns.
hive_serde	Specify the serde type of the Hive table to import into. Supported values are `avro`, `orc`, and `parquet`. By default, Hive table is assumed to be of TXT format.
part_spec	Denotes the partition specification for a Hive table.
retry	Denotes the number of retries for a job. Valid values of `retry` are 1, 2, and 3.
retry_delay	Denotes the time interval between the retries when a job fails. The unit of measurement is minutes.
db_where	where clause to be applied to the table before extracting rows to be imported
db_parallelism	Number of parallel threads to use for extracting data.
name	Add a name to the command that is useful while filtering commands from the command history. It does not accept & (ampersand), < (lesser than), > (greater than), “ (double quotes), and ‘ (single quote) special characters, and HTML tags as well. It can contain a maximum of 255 characters.
tags	Add a tag to a command so that it is easily identifiable and searchable from the commands list in the Commands History. Add a tag as a filter value while searching commands. It can contain a maximum of 255 characters. A comma-separated list of tags can be associated with a single command. While adding a tag value, enclose it in square brackets. For example, `{"tags":["<tag-value>"]}`.
use_customer_cluster	To run the query on a specific Hadoop/Hive cluster, set this option to `true`.
customer_cluster_label	Denotes the label of the Hadoop/Hive cluster on which you want to run the command.
macros	Denotes the macros that are valid assignment statements containing the variables and its expression as: `macros: [{"<variable>":<variable-expression>}, {..}]`. You can add more than one variable. For more information, see Macros.
timeout	It is a timeout for command execution that you can set in seconds. Its default value is 129600 seconds (36 hours). QDS checks the timeout for a command every 60 seconds. If the timeout is set for 80 seconds, the command gets killed in the next minute that is after 120 seconds. By setting this parameter, you can avoid the command from running for 36 hours.

Examples

Running a Data Import Command on a Hadoop/Hive cluster

Sample 1

curl -i -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Content-Type: application/json" -H "Accept: application/json" \
-d '{"command_type": "DbImportCommand", "mode":"1", "hive_table":"sqltest", "dbtap_id":"2", "db_table": "data",
"db_where":"id < 10", "db_parallelism":2, "db_columns":"col_1,col_2, "use_customer_cluster" : "true", "customer_cluster_label":"hadoop2"}' \
'https://api.qubole.com/api/v1.2/commands'

Note

The above syntax uses https://api.qubole.com as the endpoint. Qubole provides other endpoints to access QDS that are described in Supported Qubole Endpoints on Different Cloud Providers.

Sample 2

curl -i -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Content-Type: application/json" -H "Accept: application/json" \
-d { "command_type": "DbImportCommand", "mode":"1", "hive_table": "test", "dbtap_id":"286",
     "db_table" : "test-table", "db_parallelism":1, "db_columns": "a",
     "use_customer_cluster" : "true", "customer_cluster_label":"hadoop2"
    }

Note

The above syntax uses https://api.qubole.com as the endpoint. Qubole provides other endpoints to access QDS that are described in Supported Qubole Endpoints on Different Cloud Providers.

Advanced Mode

Required Role

The following users can make this API call:

Users who belong to the system-user or system-admin group.
Users who belong to a group associated with a role that allows submitting a command. See Managing Groups and Managing Roles for more information.

Note

Parameters marked in bold below are mandatory. Others are optional and have default values.

Parameter	Description
command_type	DbImportCommand
schema	It is the source database to import into. THe default schema is `default`.
hive_table	Specify Hive table to import into
mode	2
dbtap_id	DbTap ID of the source database in Qubole
db_extract_query	SQL query to be applied at the source database for extracting data. $CONDITIONS must be part of the where clause.
db_columns	Use this option when you want only a subset of all columns. It can contain a list of a comma-separated columns.
db_boundary_query	Query to be used get range of row IDs to be extracted
db_split_column	Column used as row ID to split data into ranges
db_parallelism	Number of parallel threads to use for extracting data
hive_serde	Specify the serde type of the Hive table to import into. Supported values are `avro`, `orc`, and `parquet`. By default, Hive table is assumed to be of TXT format.
additional_options	You can use this parameter to add any additional parameters to the command that you want to execute. For example, if you want to add column mapping to the command, then add `--map-column-hive id=int,data=string` through `additional_options`.
use_customer_cluster	To run the query on a specific Hadoop/Hive cluster, set this option to `true`.
customer_cluster_label	Denotes the label of the Hadoop/Hive cluster on which you want to run the command.
name	Add a name to the command that is useful while filtering commands from the command history. It does not accept & (ampersand), < (lesser than), > (greater than), “ (double quotes), and ‘ (single quote) special characters, and HTML tags as well. It can contain a maximum of 255 characters.
tags	Add a tag to a command so that it is easily identifiable and searchable from the commands list in the Commands History. Add a tag as a filter value while searching commands. It can contain a maximum of 255 characters. A comma-separated list of tags can be associated with a single command. While adding a tag value, enclose it in square brackets. For example, `{"tags":["<tag-value>"]}`.
macros	Denotes the macros that are valid assignment statements containing the variables and its expression as: `macros: [{"<variable>":<variable-expression>}, {..}]`. You can add more than one variable. For more information, see Macros.
timeout	It is a timeout for command execution that you can set in seconds. Its default value is 129600 seconds (36 hours). QDS checks the timeout for a command every 60 seconds. If the timeout is set for 80 seconds, the command gets killed in the next minute that is after 120 seconds. By setting this parameter, you can avoid the command from running for 36 hours.

Examples

Note

The examples use https://api.qubole.com as the endpoint. Qubole provides other endpoints to access QDS that are described in Supported Qubole Endpoints on Different Cloud Providers.

Example 1

curl -i -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Content-Type: application/json" -H "Accept: application/json" \
-d '{"command_type": "DbImportCommand", "mode":"2", "hive_table":"HiveData", "dbtap_id":"2",
     "db_extract_query":"select id, dt from mydata where $CONDITIONS and id < 10",
     "db_boundary_query":"select min(id), max(id) from mydata", "db_split_column":"id", "db_parallelism":2,
     "db_columns":"col_1,col_2","use_customer_cluster" : "true", "customer_cluster_label":"hadoop2"}' "https://api.qubole.com/api/v1.2/commands"

Example 2

curl -i -X POST -H "X-AUTH-TOKEN: <MYTOKEN>" -H "Content-Type: application/json" -H "Accept: application/json" \
-d '{"command_type": "DbImportCommand", "mode":"2", "hive_table":"HiveTableImportData", "dbtap_id":"1924",
     "db_extract_query":"select id, task_id, event, owner, extra, SUBSTRING(execution_date, 1, 10)
     as dt1 from table2  WHERE $CONDITIONS", "part_spec": "dt=2017-01-01" ,"db_parallelism" : "1","use_customer_cluster" : "true",
     "customer_cluster_label":"hadoop2"}' "https://api.qubole.com/api/v1.2/commands"

Example 3

curl -i -X POST -H "X-AUTH-TOKEN: <MYTOKEN>" -H "Content-Type: application/json" -H "Accept: application/json" \
-d '{"mode": "2", "hive_table": "HiveTableData", "dbtap_id": None, "db_parallelism": 2,
     "db_extract_query" : "select * from table1 where $CONDITIONS",
     "db_boundary_query" : "select min(id), max(id) from table1", "db_split_column" : "id","use_customer_cluster" : "true",
     "customer_cluster_label":"hadoop2"}' "https://api.qubole.com/api/v1.2/commands"

Example 4

curl -i -X POST -H "X-AUTH-TOKEN: <MYTOKEN>" -H "Content-Type: application/json" -H "Accept: application/json" \
-d '{"mode": "2", "hive_table": "RecentHiveData", "dbtap_id": None, "db_parallelism": 2,
     "db_extract_query" : "select * from table1 where $CONDITIONS",
     "db_boundary_query" : "select min(id), max(id) from table1", "db_split_column" : "id","use_customer_cluster" : "true",
     "customer_cluster_label":"hadoop2"}' "https://api.qubole.com/api/v1.2/commands"

Example 5

curl -i -X POST -H "X-AUTH-TOKEN: <MYTOKEN>" -H "Content-Type: application/json" -H "Accept: application/json" \
-d '{"mode": "2", "hive_table": "HiveAvroData", "dbtap_id": None, "db_parallelism": 2,
     "db_extract_query" : "select * from table4 where $CONDITIONS",
     "db_boundary_query" : "select min(id), max(id) from table4", "db_split_column" : "id",
     "part_spec" : "dt=2013/country=india", "hive_serde": "avro","use_customer_cluster" : "true", "customer_cluster_label":"hadoop2"}'
      "https://api.qubole.com/api/v1.2/commands"

Example 6

curl -i -X POST -H "X-AUTH-TOKEN: <MYTOKEN>" -H "Content-Type: application/json" -H "Accept: application/json" \
-d '{"mode": "2", "hive_table": "HiveORCData", "dbtap_id": None, "db_parallelism": 1,
     "db_extract_query" : "select id, data from table11 where $CONDITIONS",
     "part_spec" : "dt=2013/country=us", "hive_serde": "orc", "use_customer_cluster" : "true", "customer_cluster_label":"hadoop2"}'
      "https://api.qubole.com/api/v1.2/commands"