2. How can I create a Hive table to access data in object storage?
To analyze data in object storage using Hive, define a Hive table over the object store directories. This can be done a Hive DDL statement.
For example:
AWS:
CREATE EXTERNAL TABLE myTable (key STRING, value INT) LOCATION 's3n://mybucket/myDir';
Azure:
CREATE EXTERNAL TABLE myTable (key STRING, value INT) LOCATION 'wasb://[email protected]/myDir'
Oracle OCI:
CREATE EXTERNAL TABLE myTable (key STRING, value INT) LOCATION 'oci://mybucket@namespace/myDir/'
where myDir is a directory in the bucket mybucket. If myDir has subdirectories, the Hive table must be declared to be a partitioned table with a partition corresponding to each subdirectory.
Use the Explore page to explore data in object storage and define Hive tables over it. See Exploring Data in the Cloud for more information.
For MapReduce jobs you can input directories through command line options.