RCFile Tables
For best performance, Qubole recommends you create RCFiles with binary serialization, using Snappy to compress data.
Text Serialization of Columns
The following AWS example creates an RCFile table with text serialization of columns.
create external table nation_s3_rcfile
(N_NATIONKEY INT, N_NAME STRING, N_REGIONKEY INT, N_COMMENT STRING)
STORED AS RCFILE
LOCATION 's3://qtest-qubole-com/datasets/presto/functional/nation_s3_rcfile';
;
Binary Serialization of Columns
The following AWS example creates an RCFile table with binary serialization of columns.
create external table nation_s3_rcfile
(N_NATIONKEY INT, N_NAME STRING, N_REGIONKEY INT, N_COMMENT STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe'
STORED AS RCFILE
LOCATION 's3://qtest-qubole-com/datasets/presto/functional/nation_s3_rcfile';
;
Compression
To compress data while loading into it an RCFile table, use the following set
statements before inserting the data
as in this AWS example.
SET hive.exec.compress.output=true;
SET mapred.output.compression.type=BLOCK;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
insert into nation_s3_rcfile select * from nation;