5. How can I create a table in HDFS?

A CREATE TABLE statement in QDS creates a managed table in Cloud storage. To create a table in HDFS to hold intermediate data, use CREATE TMP TABLE or CREATE TEMPORARY TABLE. Remember that HDFS in QDS is ephemeral and the data is destroyed when the cluster is shut down; use HDFS only for intermediate outputs.

You can use either TMP or TEMPORARY when creating temporary tables in QDS. TEMPORARY tables are only specific to a single command run through Analyze (or Workbench) UI/REST API. You cannot use a temporary table that is created in one command in a different command (with different command ID). How different is a Qubole Hive Session from the Open Source Hive Session? describes the differences.

CREATE TMP TABLE is Qubole’s custom extension and is not part of Apache Hive. The differences are as follows:

Note

Qubole does not support TMP table from Hive 3.1.1 (beta). It recommends using TEMPORARY tables instead. You can only create TMP tables in the default database.

Characteristic

CREATE TMP TABLE

CREATE TEMPORARY TABLE

Implemented by

Qubole (supported only by QDS)

Open-source Hive. See this document and the OSS Hive Wiki for details.

Metadata

Stored in Hive metastore

Lives only in memory

Table storage

HDFS

HDFS

Life of table

QDS user session

Hive user session

Table clean-up

When QDS cluster is terminated or QDS user session ends

When Hive user session ends

Advantages

Can be shared across clusters and users and multiple query records (because the metadata is in the Hive metastore)

Short-lived, quicker clean-up

Disadvantages

Heavy clean up (traversing metastore); more disk capacity needed in HDFS because clean-up is less frequent

Available only in Hive user session; doesn’t support index, partition, etc.

Recommended if…

The temporary table is expected to live across multiple QDS query history-records

The temporary table is needed only in one query history-record

A query history-record is a single under the History tab of the QDS Analyze page.