Required Setup
To use Hive ACID, ensure to have the required setup that are described in the following section.
Engine-specific Configuration
In addition to Hive, Presto and Spark also support Hive ACID transactions. The required configuration for each engine is as listed in this table.
Hive |
Presto |
Spark |
---|---|---|
Currently, Hive ACID transactions is supported with Hive 3.1.1 (beta). The required setup for ACID transactions is described in these tasks: Maintenance Cluster for Compaction is an optional configuration but Qubole recommends to configure a separate maintenance cluster. You can use the regular Hive cluster for compaction. For more information, see Compaction of Hive Transaction Delta Directories. |
Currently, Hive ACID transactions is supported with Presto 317. The required setup for ACID transactions is described in these tasks:
Ensure that you have a cluster configured with Presto 317. |
Currently, Hive ACID transactions is supported from Spark 2.4.3. The required setup for ACID transactions is described in these tasks:
Ensure that you have a cluster configured with Spark 2.4.3 or later versions. Spark 2.4.3 only supports reading Hive ACID tables. For more information about the setup, see Setting up Hive ACID Data Source for Spark. |
Upgrading the Hive Metastore Database
Create a ticket with Qubole Support to upgrade Qubole-managed Hive Metastore database.
To upgrade a custom-managed Hive metastore, follow the steps in Upgrading the Current Hive Metastore.
Creating a Hive 3.1.1 (beta) Cluster
Understanding Cluster Operations describes how to add a new cluster. Create a Hadoop (Hive) cluster by choosing 3.1.1 [BETA] as the Hive version in the cluster’s configuration tab.
Note
To know further on cluster configuration, see Adding or Changing Settings (AWS).
Creating a Presto 317 Cluster
Understanding Cluster Operations describes how to add a new cluster. Create a Presto cluster by choosing 317 as the Presto version in the cluster’s configuration tab.
Note
To know further on cluster configuration, see Adding or Changing Settings (AWS).
Creating a Spark >= 2.4.3 Cluster
Create a new Spark cluster by choosing 2.4.3 (or later versions) as the Spark version in the cluster’s configuration tab.
Configuring Hive ACID Properties
After creating the Hive cluster, you can edit the cluster’s Advanced Configuration. Pass the following Hive acid properties in HIVE SETTINGS > Override Hive Configuration.
hive.metastore.uris=thrift://localhost:10000
hive.support.concurrency=true
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.exec.dynamic.partition.mode=nonstrict
hive.compactor.initiator.on=true
Maintenance Cluster for Compaction
Frequent insert/update/delete operations on a Hive table/partition creates many small delta directories and files. These delta directories and files can cause performance degradation over time and require compaction at regular intervals. Compaction is the aggregation of small delta directories and files into a single directory.
For more information, see Compaction of Hive Transaction Delta Directories.
Currently, ACID transactions in Presto and Spark must require a Hive maintenance cluster. However, for ACID transactions in Hive, you can use the regular Hadoop (Hive) cluster for compaction but Qubole recommends you to configure a separate Hive cluster as the maintenance cluster even for ACID transactions in Hive.
Configuring the Hive Maintenance Cluster
On the maintenance cluster, Qubole recommends you to configure N
workers (where N
is a variable for the cluster
instance type). The initiator and cleaner also run on the maintenance cluster in the background as these two processes do
not hamper with the cluster termination. Qubole only checks if any MR/Tez job is running on the cluster or not before terminating the
cluster.
These are sample Hive properties that you can override in Advanced Configuration > HIVE SETTINGS > Hive overrides of the maintenance cluster.
hive.metastore.uris=thrift://localhost:10000
hive.support.concurrency=true
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.exec.dynamic.partition.mode=nonstrict
hive.compactor.initiator.on=true
hive.compactor.worker.threads=10
hive.compactor.delta.num.threshold=10
hive.compactor.delta.pct.threshold=0.1f
Creating Hive ACID Tables
You can perform read or write ACID transactions/operations only on ACID tables. So, you must create a Hive ACID table or convert a non-ACID table into a Hive ACID table only through Hive. For more information, see Using Hive ACID Tables.