Getting Started

Let us get started with Hive ACID transaction concepts.

Transactional Tables

  • Hive ACID lets you to create transactional tables, which are Hive-managed tables that provide support for additional features such as concurrency and snapshot isolation for transactions.
  • Transactional tables are of the following types:
    • Full ACID Table: This table type supports all above mentioned DML operations and is ACID-compliant but it is only restricted to the ORC file format.
    • Insert-only (IO) Table: This table type bridges the gap between full ACID tables and Hive-managed tables. These tables are useful when you do not require capability of update and delete operations but want to support concurrent operations and snapshot isolation on other file formats other than ORC.

For more information, see Supported and Unsupported Features in Hive 3.1.1 (beta).

Operations

ACID transactions are supported for single query operations. The operations that are supported are INSERT, INSERT OVERWRITE, SELECT, UPDATE, DELETE, and MERGE. The transactions are supported with the introduction of new type of tables. You can track and manage these transactions by performing a SHOW TRANSACTIONS Hive query, which gives a list of all completed and aborted transactions. For more information, see Managing Hive Transactions.

Compaction

Frequent insert/update/delete operations on a Hive table/partition creates many small delta directories and files. These delta directories and files can cause performance degradation over time and require compaction at regular intervals. Compaction is the aggregation of small delta directories and files into a single directory.

For more details on configuring and using compaction, see Compaction of Hive Transaction Delta Directories.