Understanding Data Export
You can export data from Cloud storage and Hive tables to relational databases such as MySQL, Postgres, Vertica and others. QDS uses Sqoop to export data.
Export Data
Create a Data Store if you haven’t already done so.
Navigate to the Analyze page and choose Data Export from the drop-down menu at the top of the right pane.
Select your data store, and choose the export mode, export directory, and other options. More details below.
Choose Run from the menu bar at the top right of the page.
Details
Export Modes:
Directory Export: Specify which Cloud storage directory to export to the data store destination. Note that this is a non recursive export. You can specify the Field Delimiter (default is ^A).
Hive Table Export: Specify which Hive table to export to the data store destination. QDS supports exporting all Hive table formats: text serde, ORC (Optimized Row Columnar File), and RC (Record Columnar File).
If the Hive table has no partitions, all the data in the Hive table is exported.
If the Hive table has partitions, specify a value for each partition in the Hive Partition Spec field. Example: If the Hive table has partitions
date
andcountry
with values2012-01-01
andUSA
respectively, the partition spec must be:date=2012-01-01/country=USA
. You will see an error message if you only specifydate=2012-01-01
orcountry=USA
.
Database Insertion Modes
QDS supports append and update modes for all the supported databases as well as upsert mode for MySQL. In update mode, you must also specify the update keys as they are used by the update query.
Using Redshift with AWS S3
For Redshift data stores, use the Copy
command (as recommended by AWS) to export data from S3 to Redshift.
This does not work if the Redshift Database and S3 location are in different regions. Qubole automatically falls back to normal
insert queries if the Copy
command fails, but this can be significantly slower. Under these circumstances
Hive Table Export will fail because it is not possible to export the partition columns via Copy
since they
are not part of the data, and fallback to insert-based exports fails because of the schema mismatch.
Note
In case of an IAM-Roles-based account, attach the same Role ARN to the Redshift instance that has been granted to Qubole.
Extending the Data Export Command
QDS allows you to build more complex systems on top of the existing functionality. You can export custom columns, custom serdes, or multiple partitions. Let us take an AWS example in which you want to export only a few columns from a Hive table, or you want to export many partitions in single command:
Select a Workflow in the Compose Tab on the Analyze page.
Select Hive command as the first sub-command of the workflow. Write the custom Hive query whose result you want to export. You can write this query either as insert overwrite directory s3://… or as create table as …
Select Data Export as the second sub-command of the Workflow. Use Hive Table Export or Directory export depending on whether you used create table or insert overwrite directory in the previous step.
Optionally select Hive command as the last sub-command of the Workflow to cleanup S3 using a drop table command.