Required Setup

To use Hive ACID, ensure to have the required setup that are described in the following section.

Engine-specific Configuration

In addition to Hive, Presto and Spark also support Hive ACID transactions. The required configuration for each engine is as listed in this table.

Hive Presto Spark

Currently, Hive ACID transactions is supported with Hive 3.1.1 (beta). The required setup for ACID transactions is described in these tasks:

  1. Upgrading the Hive Metastore Database
  2. Configuring Hive ACID Properties
  3. Creating a Hive 3.1.1 (beta) Cluster
  4. Creating Hive ACID Tables

Maintenance Cluster for Compaction is an optional configuration but Qubole recommends to configure a separate maintenance cluster. You can use the regular Hive cluster for compaction. For more information, see Compaction of Hive Transaction Delta Directories.

Currently, Hive ACID transactions is supported with Presto 317. The required setup for ACID transactions is described in these tasks:

  1. Upgrading the Hive Metastore Database on the Presto cluster.
  2. Creating a Presto 317 Cluster
  3. Configuring the Hive Maintenance Cluster for compaction.
  4. Creating Hive ACID Tables

Ensure that you have a cluster configured with Presto 317.

Currently, Hive ACID transactions is supported from Spark 2.4.3. The required setup for ACID transactions is described in these tasks:

  1. Upgrading the Hive Metastore Database on the Spark cluster.
  2. Upgrade Hive Metastore Service to 3.1.1.
  3. Creating a Spark >= 2.4.3 Cluster
  4. Configuring the Hive Maintenance Cluster for compaction.
  5. Creating Hive ACID Tables

Ensure that you have a cluster configured with Spark 2.4.3 or later versions. Spark 2.4.3 only supports reading Hive ACID tables.

For more information about the setup, see Setting up Hive ACID Data Source for Spark.

Upgrading the Hive Metastore Database

Create a ticket with Qubole Support to upgrade Qubole-managed Hive Metastore database.

To upgrade a custom-managed Hive metastore, follow the steps in Upgrading the Current Hive Metastore.

Creating a Hive 3.1.1 (beta) Cluster

Understanding Cluster Operations describes how to add a new cluster. Create a Hadoop (Hive) cluster by choosing 3.1.1 [BETA] as the Hive version in the cluster’s configuration tab.

Note

To know further on cluster configuration, see Adding or Changing Settings (AWS).

Creating a Presto 317 Cluster

Understanding Cluster Operations describes how to add a new cluster. Create a Presto cluster by choosing 317 as the Presto version in the cluster’s configuration tab.

Note

To know further on cluster configuration, see Adding or Changing Settings (AWS).

Creating a Spark >= 2.4.3 Cluster

Create a new Spark cluster by choosing 2.4.3 (or later versions) as the Spark version in the cluster’s configuration tab.

Configuring Hive ACID Properties

After creating the Hive cluster, you can edit the cluster’s Advanced Configuration. Pass the following Hive acid properties in HIVE SETTINGS > Override Hive Configuration.

hive.metastore.uris=thrift://localhost:10000
hive.support.concurrency=true
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.exec.dynamic.partition.mode=nonstrict
hive.compactor.initiator.on=true

Maintenance Cluster for Compaction

Frequent insert/update/delete operations on a Hive table/partition creates many small delta directories and files. These delta directories and files can cause performance degradation over time and require compaction at regular intervals. Compaction is the aggregation of small delta directories and files into a single directory.

For more information, see Compaction of Hive Transaction Delta Directories.

Currently, ACID transactions in Presto and Spark must require a Hive maintenance cluster. However, for ACID transactions in Hive, you can use the regular Hadoop (Hive) cluster for compaction but Qubole recommends you to configure a separate Hive cluster as the maintenance cluster even for ACID transactions in Hive.

Configuring the Hive Maintenance Cluster

On the maintenance cluster, Qubole recommends you to configure N workers (where N is a variable for the cluster instance type). The initiator and cleaner also run on the maintenance cluster in the background as these two processes do not hamper with the cluster termination. Qubole only checks if any MR/Tez job is running on the cluster or not before terminating the cluster.

These are sample Hive properties that you can override in Advanced Configuration > HIVE SETTINGS > Hive overrides of the maintenance cluster.

hive.metastore.uris=thrift://localhost:10000
hive.support.concurrency=true
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.exec.dynamic.partition.mode=nonstrict
hive.compactor.initiator.on=true
hive.compactor.worker.threads=10
hive.compactor.delta.num.threshold=10
hive.compactor.delta.pct.threshold=0.1f

Creating Hive ACID Tables

You can perform read or write ACID transactions/operations only on ACID tables. So, you must create a Hive ACID table or convert a non-ACID table into a Hive ACID table only through Hive. For more information, see Using Hive ACID Tables.