Setting up Hive ACID Data Source for SparkΒΆ

To set up the Hive ACID data source for Spark, set up the Hive MetaStore Service, create a compaction cluster, and read the Hive ACID tables through Scala or SQL.

Perform the following steps:

  1. Upgrade the Hive Metastore Database on the Spark cluster. To upgrade custom-managed metastore, use metastore upgrade scripts from Hive [3.1.2 release version] as Hive 3.1.2 version has backward compatibility related fixes.

  2. Set up Spark cluster for ACID.

    1. Create a Spark cluster with Spark 2.4.3 or later version.
    2. Upgrade Hive Metastore Service on Spark cluster to 3.1.1. Contact Qubole Support for the upgrade.
  3. Configure the Hive Maintenance cluster for compaction. For more information, see Configuring the Hive Maintenance Cluster.

  4. Obtain the latest ACID Jars from Qubole Support.

  5. Use the following configurations to run your running ACID jobs.

    1. --jars <path-to-latest-ACID-jar>
    2. --conf spark.sql.extensions=com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension
  6. Verify that the configuration property in the $SPARK_HOME/conf/hive-site.xml file contains the information about the configured HMS server endpoint.

    Sample code

    <configuration>
       <property>
       <name>hive.metastore.uris</name>
         <!-- hostname must point to the Hive metastore URI in your cluster -->
         <value>thrift://hostname:10000</value>
         <description>URI for spark to contact the hive metastore server</description>
       </property>
     </configuration>