5. How can I create a table in HDFS?

A CREATE TABLE statement in QDS creates a managed table in Cloud storage. To create a table in HDFS to hold intermediate data, use CREATE TMP TABLE or CREATE TEMPORARY TABLE. Remember that HDFS in QDS is ephemeral and the data is destroyed when the cluster is shut down; use HDFS only for intermediate outputs.

You can use either TMP or TEMPORARY when creating temporary tables in QDS. CREATE TMP TABLE is Qubole’s custom extension and is not part of Apache Hive. The differences are as follows:

Characteristic CREATE TMP TABLE CREATE TEMPORARY TABLE
Implemented by Qubole (supported only by QDS) Open-source Hive. See this document and the OSS Hive Wiki for details.
Metadata Stored in Hive metastore Lives only in memory
Table storage HDFS HDFS
Life of table QDS user session Hive user session
Table clean-up When QDS cluster is terminated or QDS user session ends When Hive user session ends
Advantages Can be shared across clusters and users and multiple query records (because the metadata is in the Hive metastore) Short-lived, quicker clean-up
Disadvantages Heavy clean up (traversing metastore); more disk capacity needed in HDFS because clean-up is less frequent Available only in Hive user session; doesn’t support index, partition, etc.
Recommended if… The temporary table is expected to live across multiple QDS query history-records (a query history-record is the one row a user can see in the History view on the QDS Analyze page) The temporary table is needed only in one query history-record