Data Export

You can export data from Cloud storage and Hive tables to relational databases such as MySQL, Postgres and Vertica. QDS uses Sqoop to export data. First, create a Data Store and set it as the destination.

Controls are grouped into two extraction modes:

1. Directory Export

Specify which Cloud storage directory to export to the data store destination. Note that this is a non recursive export. You can specify the Field Delimiter (default is ^A).

2. Hive Table Export

Specify which Hive table to export to the data store destination. QDS supports exporting all Hive table formats: text serde, ORC (Optimized Row Columnar File), and RC (Record Columnar File).

Tables with No Partitions

If the Hive table has no partitions, all the data in the Hive table is exported.

Tables with Partitions

If the Hive table has partitions, you need to specify a value for each partition in the partition spec field. Example: If the Hive table has partitions date and country with values 2012-01-01 and USA respectively, the partition spec must be: date=2012-01-01/country=USA. You will see an error message if you only specify date=2012-01-01 or country=USA.

Database Insertion Modes

QDS supports append and update modes for all the supported databases as well as upsert mode for MySQL. In update mode, you must also specify the update keys as they are used by the update query.

Redshift Support (AWS)

For Redshift data stores, use the Copy command (as recommended by AWS) to export data from S3 to Redshift.

This does not work if the Redshift Database and S3 location are in different regions. Qubole automatically falls back to normal insert queries if the Copy command fails, but this can be significantly slower. Under these circumstances Hive Table Export will fail because it is not possible to export the partition columns via Copy since they are not part of the data, and fallback to insert-based exports fails because of the schema mismatch.

Note

In case of an IAM-Roles-based account, attach the same Role ARN to the Redshift instance that has been granted to Qubole.

Extending the Data Export Command

QDS allows you to build more complex systems on top of the existing functionality. You can export custom columns, custom serdes, or multiple partitions. Let us take an AWS example in which you want to export only a few columns from a Hive table, or you want to export many partitions in single command:

  • Select a Workflow in the Compose Tab on the Analyze page.
  • Select Hive command as the first sub-command of the workflow. Write the custom Hive query whose result you want to export. You can write this query either as insert overwrite directory s3://... or as create table as …
  • Select Data Export as the second sub-command of the Workflow. Use Hive Table Export or Directory export depending on whether you used create table or insert overwrite directory in the previous step.
  • Optionally select Hive command as the last sub-command of the Workflow to cleanup S3 using a drop table command.