RCFile Tables

For best performance, Qubole recommends you create RCFiles with binary serialization, using Snappy to compress data.

Text Serialization of Columns

The following AWS example creates an RCFile table with text serialization of columns.

create external table nation_s3_rcfile
(N_NATIONKEY INT, N_NAME STRING, N_REGIONKEY INT, N_COMMENT STRING)
STORED AS RCFILE
LOCATION  's3://qtest-qubole-com/datasets/presto/functional/nation_s3_rcfile';
;

Binary Serialization of Columns

The following AWS example creates an RCFile table with binary serialization of columns.

create external table nation_s3_rcfile
(N_NATIONKEY INT, N_NAME STRING, N_REGIONKEY INT, N_COMMENT STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe'
STORED AS RCFILE
LOCATION  's3://qtest-qubole-com/datasets/presto/functional/nation_s3_rcfile';
;

Compression

To compress data while loading into it an RCFile table, use the following set statements before inserting the data as in this AWS example.

SET hive.exec.compress.output=true;
SET mapred.output.compression.type=BLOCK;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
insert into nation_s3_rcfile select * from nation;