Understanding Data Import
You can import data from relational databases such as MySql, Postgres, Vertica and Redshift as well as non-relational databases such as MongoDB. QDS uses Sqoop to import data.
Note
You can see the list of supported features in the Sqoop User Guide. Though it is mentioned in the Sqoop User Guide, Data Import job doesn’t support Incremental Imports.
Procedure
Create a Data Store if you haven’t already done so.
Navigate to the Analyze page in the QDS UI and choose Data Import from the drop-down menu at the top of the right pane.
Select your data store, and choose the table, extraction mode, and other options. More details below.
Choose Run from the menu bar at the top right of the page.
Details
There are two extraction modes:
Simple
Advanced
Simple
In this mode, you can choose the input and output tables, the columns to extract, a where
clause,
the output format, a partition specification for the Hive table, and other options.
Advanced
In this mode, you can specify a free form query. For example, you can join multiple tables and extract the
(denormalized) data. But if you want parallel extracts, a ‘$CONDITIONS’ macro
must be part of a where
clause in the query (see below).
Controlling Parallelism
Sqoop can parallelize database extracts. By default, it automatically figures out the level of parallelism and performs up to 4 extracts in parallel. But you can specify the level of parallelism explicitly. Most importantly, the level of parallelism can be set to 1; this disables parallel extracts.
In Simple mode, Sqoop tries to parallelize extracts using an integer primary key (if one is available).
In Advanced mode, you have more control over how to parallelize the extracts.
You can specify a column to parallelize on (this is the column that
must be visible in the scope of the ‘$CONDITIONS’ macro embedded inside
the free form query). You must also specify a boundary query (--boundary-query
) that returns the minimum and
maximum values of the column to parallelize on. Qubole strongly encourages you to read the Sqoop
documentation on free-form imports (for example, the relevant chapter in the Apache Sqoop Cookbook) to understand
these controls.