Supported Hive ACID Features for Spark

The ACID features that Spark on Qubole supports include:

Capabilities

Spark on Qubole supports READ, INSERT, UPDATE, DELETE, and STREAMING INSERT capabilities on Hive ACID tables. MERGE will be supported in future releases.

For more information about using these capabilities, see Using the Supported Capabilities.

Tables

  • READ on both partitioned / bucketed table is supported.
  • Spark on Qubole allows you to read two different kinds of ACID tables, Each type of ACID table is briefly described in the following list:
    • Full ACID Table: This table type supports all above mentioned DML operations and is ACID-compliant but it is only restricted to the ORC file format. UPDATE and DELETE operations are supported on both partitioned and non-Partitioned Table, and ORC file format. Currently, UPDATE and DELETE operations are not supported on bucketed tables.
    • Insert-only (IO) Table: This table type bridges the gap between full ACID tables and Hive-managed tables. These tables are useful when you do not require capability of UPDATE and DELETE operations but want to support concurrent operations and snapshot isolation on other file formats other than ORC.

Transactions

Transactions support the following operations:

  • INSERT, INSERT OVERWRITE, and STREAMING INSERT for Insert-only (IO) Tables.
  • UPDATE, DELETE, INSERT, INSERT OVERWRITE, and STREAMING INSERT for Full ACID Tables.

Currently, BEGIN, COMMIT, and ROLLBACK are not supported explicitly.

Transactions provide only snapshot isolation, in which consistent snapshot of the table is read at the start of the transaction.

Transaction operations such as dirty read, read committed, repeatable read, or serializable are not supported in this release.

File Formats

Full ACID tables support Optimized Row column (ORC) file format only. The file format of the Non-ACID table that has to be converted to a Full ACID table must be ORC.

Insert-only tables support file formats such as TextFile, SequenceFile, RCfile, ORC, and parquet. The default file format of Hive-managed tables in Qubole is TextFile, which is also the default format for Insert-only tables.