Supported Hive ACID Features for Spark
The ACID features that Spark on Qubole supports include:
Capabilities
Spark on Qubole supports READ, INSERT, UPDATE, DELETE, and STREAMING INSERT capabilities on Hive ACID tables.
MERGE will be supported in future releases.
For more information about using these capabilities, see Using the Supported Capabilities.
Tables
READon both partitioned / bucketed table is supported.Spark on Qubole allows you to read two different kinds of ACID tables, Each type of ACID table is briefly described in the following list:
Full ACID Table: This table type supports all above mentioned DML operations and is ACID-compliant but it is only restricted to the ORC file format.
UPDATEandDELETEoperations are supported on both partitioned and non-Partitioned Table, and ORC file format. Currently,UPDATEandDELETEoperations are not supported on bucketed tables.Insert-only (IO) Table: This table type bridges the gap between full ACID tables and Hive-managed tables. These tables are useful when you do not require capability of
UPDATEandDELETEoperations but want to support concurrent operations and snapshot isolation on other file formats other than ORC.
Transactions
Transactions support the following operations:
INSERT,INSERT OVERWRITE, andSTREAMING INSERTfor Insert-only (IO) Tables.UPDATE,DELETE,INSERT,INSERT OVERWRITE, andSTREAMING INSERTfor Full ACID Tables.
Currently, BEGIN, COMMIT, and ROLLBACK are not supported explicitly.
Transactions provide only snapshot isolation, in which consistent snapshot of the table is read at the start of the transaction.
Transaction operations such as dirty read, read committed, repeatable read, or serializable are not supported in this release.
File Formats
Full ACID tables support Optimized Row column (ORC) file format only. The file format of the Non-ACID table that has to be converted to a Full ACID table must be ORC.
Insert-only tables support file formats such as TextFile, SequenceFile, RCfile, ORC, and parquet.
The default file format of Hive-managed tables in Qubole is TextFile, which is also the default format for Insert-only tables.