Composing a Data Import Command through the UI¶

You can import data from a supported database into a Hive table using the command composer on the Analyze page. See Data Import for more information on importing data into Hive tables.

Supported databases include Redshift, Vertica, Microsoft SQL Server, Oracle MySQL, Postgres, and MongoDB.

Prerequisites¶

You must have an existing data store of the same type as the database from which the data is to be imported. Create a data store in Explore if it does not exist.

Note

Hadoop 2, Presto, and Spark clusters support database commands; these commands can also be run without bringing up a cluster. See Mapping of Cluster and Command Types for more information.

Qubole supports importing a non-default schema from Postgres and Redshift data stores into a Hive database.

Compose a Data Import Command¶

Note

Using the Supported Keyboard Shortcuts in Analyze describes the supported keyboard shortcuts.

Perform the following steps to compose a data import command:

Navigate to the Analyze page and click Compose. Select Data Import from the Command Type drop-down list.
Select a data store from the Data Store drop-down list.
Choose a table from the DbTable drop-down list.
In the Columns to Extract text field, specify the columns that you want to extract.
In Filters column, specify any condition you want to apply to the table data; for example, id > 5.
Set the parallelism value in the drop-down list. 1 is the default value. See Data Import for more information on parallelism.
Select a Mode for importing data: Simple (default) or Advanced. See Data Import for more information on the data import modes.
From the Hive Database drop-down list, select the Hive database to which the data is to be imported. Click the Refresh icon to refresh the list.

Select a Hive table from the Hive Table Name drop-down list. Click the Refresh icon for refreshing the list of Hive tables.

Select an output table format from the Choose output table format drop-down list. Qubole supports Avro, Optimized Row Columnar (ORC), and text formats. ORC is the default. See Avro Tables and ORC Tables for more information on the the table formats.
In the Hive Table Partition Spec text field, specify the Hive partitions if any. Leave the field blank if the table has no partitions.
If you want to run the command on a Hadoop2 or Spark cluster, select the Use Hadoop2/Spark Cluster check box and choose the cluster label from the drop-down list.
Click Run to execute the command. Click Save if you want to re-run the same command later (see Workspace for more information on saving commands and queries).

You can see the result under the Results tab and the logs under the Logs tab. The Logs tab has the Errors and Warnings filter. For more information on how to download command results and logs, see Downloading Results and Logs.