You can export data from Cloud storage and Hive tables to relational databases such as MySQL, Postgres, Vertica and others. QDS uses Sqoop to export data. First, create a Data Store and set it as the destination.
Controls are grouped into two extraction modes:
1. Directory Export¶
Specify which Cloud storage directory to export to the data store destination. Note that this is a non recursive export. You can specify the Field Delimiter (default is ^A).
2. Hive Table Export¶
Specify which Hive table to export to the data store destination. QDS supports exporting all Hive table formats: text serde, ORC (Optimized Row Columnar File), and RC (Record Columnar File).
Tables with No Partitions¶
If the Hive table has no partitions, all the data in the Hive table is exported.
Tables with Partitions¶
If the Hive table has partitions, you need to specify a value for each partition in the partition spec field.
Example: If the Hive table has partitions
country with values
USA respectively, the partition spec must be:
You will see an error message if you only specify
Database Insertion Modes¶
QDS supports append and update modes for all the supported databases as well as upsert mode for MySQL. In update mode, you must also specify the update keys as they are used by the update query.
Redshift Support (AWS)¶
For Redshift data stores, use the
Copy command (as recommended by AWS) to export data from S3 to Redshift.
This does not work if the Redshift Database and S3 location are in different regions. Qubole automatically falls back to normal
insert queries if the
Copy command fails, but this can be significantly slower. Under these circumstances
Hive Table Export will fail because it is not possible to export the partition columns via
Copy since they
are not part of the data, and fallback to insert-based exports fails because of the schema mismatch.
In case of an IAM-Roles-based account, attach the same Role ARN to the Redshift instance that has been granted to Qubole.
Extending the Data Export Command¶
QDS allows you to build more complex systems on top of the existing functionality. You can export custom columns, custom serdes, or multiple partitions. Let us take an AWS example in which you want to export only a few columns from a Hive table, or you want to export many partitions in single command:
- Select a Workflow in the Compose Tab on the Analyze page.
- Select Hive command as the first sub-command of the workflow. Write the custom Hive query whose result you want to export. You can write this query either as insert overwrite directory s3://... or as create table as …
- Select Data Export as the second sub-command of the Workflow. Use Hive Table Export or Directory export depending on whether you used create table or insert overwrite directory in the previous step.
- Optionally select Hive command as the last sub-command of the Workflow to cleanup S3 using a drop table command.