5. How can I create a table in HDFS?

A CREATE TABLE statement in QDS creates a managed table in Cloud storage. To create a table in HDFS to hold intermediate data, use CREATE TMP TABLE or CREATE TEMPORARY TABLE. Remember that HDFS in QDS is ephemeral and the data is destroyed when the cluster is shut down; use HDFS only for intermediate outputs.

You can use either TMP or TEMPORARY when creating temporary tables in QDS. TEMPORARY tables are only specific to a single command run through Analyze (or Workbench) UI/REST API. You cannot use a temporary table that is created in one command in a different command (with different command ID). How different is a Qubole Hive Session from the Open Source Hive Session? describes the differences.

CREATE TMP TABLE is Qubole’s custom extension and is not part of Apache Hive. The differences are as follows:

Note

Qubole does not support TMP table from Hive 3.1.1 (beta). It recommends using TEMPORARY tables instead. You can only create TMP tables in the default database.

Characteristic CREATE TMP TABLE CREATE TEMPORARY TABLE
Implemented by Qubole (supported only by QDS) Open-source Hive. See this document and the OSS Hive Wiki for details.
Metadata Stored in Hive metastore Lives only in memory
Table storage HDFS HDFS
Life of table QDS user session Hive user session
Table clean-up When QDS cluster is terminated or QDS user session ends When Hive user session ends
Advantages Can be shared across clusters and users and multiple query records (because the metadata is in the Hive metastore) Short-lived, quicker clean-up
Disadvantages Heavy clean up (traversing metastore); more disk capacity needed in HDFS because clean-up is less frequent Available only in Hive user session; doesn’t support index, partition, etc.
Recommended if… The temporary table is expected to live across multiple QDS query history-records The temporary table is needed only in one query history-record

A query history-record is a single under the History tab of the QDS Analyze page.