JSON Tables
The JSON serde is useful for parsing data stored as JSON. The JSON implementation has been borrowed from rcongiu.
Data
{"n_nationkey":"5", "n_name":"ETHIOPIA", "n_regionkey":"0", "n_comment":"ven packages wake quickly. regu" }
{"n_nationkey":"6", "n_name":"FRANCE", "n_regionkey":"3", "n_comment":"refully final requests. regular, ironi" }
{"n_nationkey":"7", "n_name":"GERMANY", "n_regionkey":"3", "n_comment":"l platelets. regular accounts x-ray: unusual, regular acco" }
DDL Statement (AWS Example)
CREATE EXTERNAL TABLE nation_s3_json_openx(
n_nationkey string,
n_name string,
n_regionkey string,
n_comment string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://qtest-qubole-com/datasets/presto/functional/nation_s3_json';
;
Features
The features of JSON tables are:
Entire JSON document must fit in a single line of the text file.
Read the data stored in the JSON format.
Convert the data to the JSON format when INSERT INTO table.
Arrays and maps are supported.
Nested data structures are also supported.
Nested JSON Elements
If your data contains nested JSON elements like this:
{"country":"Switzerland","languages":["German","French","Italian"],"religions":{"catholic":[10,20],"protestant":[40,50]}}
You can declare languages as an array<string> and religions as a map<string,array<int> like this (location omitted).
CREATE EXTERNAL TABLE json_nested_test (
country string,
languages array<string>,
religions map<string,array<int>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE
LOCATION '<scheme>..'
;
<scheme>
is the Cloud-specific URI and path: for example, s3://
(AWS), wasb://
(Azure blob storage).
You can access a nested element like this
select religions['catholic'][0] from json_nested_test;
Which produces the result
10