JSON Tables

The JSON serde is useful for parsing data stored as JSON. The JSON implementation has been borrowed from rcongiu.

Data

{"n_nationkey":"5", "n_name":"ETHIOPIA", "n_regionkey":"0", "n_comment":"ven packages wake quickly. regu" }
{"n_nationkey":"6", "n_name":"FRANCE", "n_regionkey":"3", "n_comment":"refully final requests. regular, ironi" }
{"n_nationkey":"7", "n_name":"GERMANY", "n_regionkey":"3", "n_comment":"l platelets. regular accounts x-ray: unusual, regular acco" }

DDL Statement (AWS Example)

CREATE EXTERNAL TABLE nation_s3_json_openx(
n_nationkey string,
n_name string,
n_regionkey string,
n_comment string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION  's3://qtest-qubole-com/datasets/presto/functional/nation_s3_json';
;

Features

The features of JSON tables are:

  • Entire JSON document must fit in a single line of the text file.
  • Read the data stored in the JSON format.
  • Convert the data to the JSON format when INSERT INTO table.
  • Arrays and maps are supported.
  • Nested data structures are also supported.

Nested JSON Elements

If your data contains nested JSON elements like this:

{"country":"Switzerland","languages":["German","French","Italian"],"religions":{"catholic":[10,20],"protestant":[40,50]}}

You can declare languages as an array<string> and religions as a map<string,array<int> like this (location omitted).

CREATE EXTERNAL TABLE json_nested_test (
     country string,
     languages array<string>,
     religions map<string,array<int>>)
 ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
 STORED AS TEXTFILE
 LOCATION '<scheme>..'
;

<scheme> is the Cloud-specific URI and path: for example, s3:// (AWS), wasb:// (Azure blob storage).

You can access a nested element like this

select religions['catholic'][0] from json_nested_test;

Which produces the result

10