Supported Hive ACID Features for Spark
The ACID features that Spark on Qubole supports include:
Capabilities
Spark on Qubole supports READ
, INSERT
, UPDATE
, DELETE
, and STREAMING INSERT
capabilities on Hive ACID tables.
MERGE
will be supported in future releases.
For more information about using these capabilities, see Using the Supported Capabilities.
Tables
READ
on both partitioned / bucketed table is supported.Spark on Qubole allows you to read two different kinds of ACID tables, Each type of ACID table is briefly described in the following list:
Full ACID Table: This table type supports all above mentioned DML operations and is ACID-compliant but it is only restricted to the ORC file format.
UPDATE
andDELETE
operations are supported on both partitioned and non-Partitioned Table, and ORC file format. Currently,UPDATE
andDELETE
operations are not supported on bucketed tables.Insert-only (IO) Table: This table type bridges the gap between full ACID tables and Hive-managed tables. These tables are useful when you do not require capability of
UPDATE
andDELETE
operations but want to support concurrent operations and snapshot isolation on other file formats other than ORC.
Transactions
Transactions support the following operations:
INSERT
,INSERT OVERWRITE
, andSTREAMING INSERT
for Insert-only (IO) Tables.UPDATE
,DELETE
,INSERT
,INSERT OVERWRITE
, andSTREAMING INSERT
for Full ACID Tables.
Currently, BEGIN
, COMMIT
, and ROLLBACK
are not supported explicitly.
Transactions provide only snapshot isolation, in which consistent snapshot of the table is read at the start of the transaction.
Transaction operations such as dirty read, read committed, repeatable read, or serializable are not supported in this release.
File Formats
Full ACID tables support Optimized Row column (ORC) file format only. The file format of the Non-ACID table that has to be converted to a Full ACID table must be ORC.
Insert-only tables support file formats such as TextFile
, SequenceFile
, RCfile
, ORC
, and parquet
.
The default file format of Hive-managed tables in Qubole is TextFile, which is also the default format for Insert-only tables.