Using the Catalog Configuration¶
A Presto catalog consists of schemas and refers to a data source through a connector. Qubole allows you to add the catalog through a simplified way by just defining its properties through the Presto overrides on the Presto cluster. You can add the catalog using the syntax below through the Presto override.
catalog/<catalog-name>.properties: <catalog property 1> <catalog property 2> . . . <catalog property n>
Qubole provides table-level security for Hive tables accessed through Presto. See Understanding Qubole Hive Authorization for more information.
If the cluster is on Presto version 0.208, run this command to clear the metastore cache maintained in the coordinator node for a given Hive catalog.
The following table describes the common Hive catalog properties.
|hive.metastore-timeout||3m, 1h||3m||Timeout for Hive metastore calls that is, it denotes how long a request waits to fetch data from the metastore before getting timed out.|
|hive.metastore-cache-ttl||5m, 20m||20m||It denotes a data entry’s life duration in the metastore cache before it is evicted. Metastore caches tables, partitions, databases, and so on that are fetched from the Hive metastore. Configuring Thrift Metastore Server Interface for the Custom Metastore (AWS) describes how to configure Hive Thrift Metastore Interface.|
|hive.metastore-cache-ttl-bulk||20m, 1d||NA||When you have a query that you need to run on
It denotes the time after which a background refresh
for an entry in the metastore cache is triggered. If
you still see stale results, then you can see fresh
results when you run the query for the
second time. Suppose, if you do not set this parameter
or when its value is greater than
To avoid such a scenario, If you set this parameter
and if the query is run after the refresh interval
has expired, then the query returns the cache
entry quickly and starts a background cache refresh.
So, it is useful to set the value of
It is used to skip corrupt records in input formats
The behavior for the corrupted file is non-deterministic that is Presto might read some part of the file before hitting corrupt data and in such a case, the QDS record reader returns whatever it read until this point and skips the rest of the file.
||It is enabled by default and hence, the information
schema only includes the Presto views and not the Hive
views. When it is set to
Hive Catalog Properties associated with AWS¶
These catalog properties are associated with AWS.
|hive.s3.multipart.min-file-size||18MB, 20MB||16MB||Minimum file size for an S3 multipart upload|
|hive.s3.multipart.min-part-size||8MB, 9MB||6MB||Minimum part size for an S3 multipart upload|
||It is used to configure server-side encryption for
data at rest on S3, by setting it to
||NA||It is used to specify the type of server-side
||It is used to secure the
communication between Amazon S3 and
the Presto cluster using SSL. Set the
||When it is enabled, the S3 bucket owner gets complete permissions over the files written into it by other users.|
|hive.s3-secondary-role-arn||Secondary IAM Role’s ARN||NA||If Dual IAM Roles are configured on the Qubole account, then add the ARN of the secondary IAM Role that is inaccessible by Qubole to access S3 buckets using this property. For more information, see Creating Dual IAM Roles for your Account.|
|hive.s3-secondary-role-extid||Secondary IAM Role’s External ID||NA||If Dual IAM Roles are configured on the Qubole account, then add the external ID of the secondary IAM Role that is inaccessible by Qubole to access S3 buckets using this property. For more information, see Creating Dual IAM Roles for your Account.|
|hive.stale-listing-max-retry-time||10s, 1m, 10m||1m||It is the duration within which File readers continue
retrying if there is query failure with FileNotFound
exception. File readers retry the file open
operation if there is a failure with FileNotFound
exception. This provides a safeguard against the S3
Eventual Consistency issue where the master node could
see the file and created split for it but the worker
node could not read the file with FileNotFound
exception from S3. The session-level property to set
the maximum duration is