Using the Catalog Configuration
A Presto catalog consists of schemas and refers to a data source through a connector. Qubole allows you to add the catalog through a simplified way by just defining its properties through the Presto overrides on the Presto cluster. You can add the catalog using the syntax below through the Presto override.
catalog/<catalog-name>.properties:
<catalog property 1>
<catalog property 2>
.
.
.
<catalog property n>
catalog/hive.properties
Qubole provides table-level security for Hive tables accessed through Presto. See Understanding Qubole Hive Authorization for more information.
The following table describes the common Hive catalog properties.
Parameter |
Examples |
Default |
Description |
Supported Presto Version |
---|---|---|---|---|
hive.metastore-timeout |
3m, 1h |
3m |
Timeout for Hive metastore calls that is, it denotes how long a request waits to fetch data from the metastore before getting timed out. |
Presto versions 0.208 and 317 |
hive.metastore-cache-ttl |
5m, 20m |
20m |
It denotes a data entry’s life duration in the metastore cache before it is evicted. Metastore caches tables, partitions, databases, and so on that are fetched from the Hive metastore. Configuring Thrift Metastore Server Interface for the Custom Metastore describes how to configure Hive Thrift Metastore Interface. |
Presto versions 0.208 and 317 |
hive.metastore-cache-ttl-bulk |
20m, 1d |
NA |
When you have a query that you need to run on
|
Presto 0.208 version and older |
hive.metastore-refresh-interval |
10m, 20m |
100m |
It is the time interval set for refreshing metastore cache. After each interval expires, metastore cache is refreshed. So, in case if you see stale results for a query, then running the same query would fetch results without the stale data (assuming the time interval is Suppose, assume that you disable this parameter or set it by adding a value that is higher than that of expired).
You can avoid seeing stale results in the executed
query by setting this parameter to a value that is
lesser than |
Presto versions 0.208 and 317 |
hive.security |
|
|
|
Presto versions 0.208 and 317 |
hive.skip-corrupt-records |
|
|
It is used to skip corrupt records in input formats
other than Note The behavior for the corrupted file is non-deterministic that is Presto might read some part of the file before hitting corrupt data and in such a case, the QDS record reader returns whatever it read until this point and skips the rest of the file. |
Presto versions 0.208 and 317 |
hive.information-schema-presto-view-only |
|
|
It is enabled by default and hence, the information
schema only includes the Presto views and not the Hive
views. When it is set to |
Presto versions 0.208 and 317 |
hive.metastore.thrift.impersonation.enabled |
|
|
It adds impersonation support for calls to the Hive metastore when you enable this property. It allows Presto to impersonate the user, who runs the query to access the Hive metastore. |
Presto version 317 |
hive.max-partitions-per-scan |
100000, 150000 |
100000 |
It is the maximum number of partitions for a single table scan. |
Presto versions 0.208 and 317 |
hive.max-execution-partitions-per-scan |
180000, 150000 |
The configured value of
|
You can use this property along with a relaxed limit
on Note Using this runtime limit can cause
Presto to scan data from
|
Presto versions 0.208 and 317 |
Hive Catalog Properties associated with AWS
These catalog properties are associated with AWS.
Parameter |
Examples |
Default |
Description |
Supported Presto Version |
---|---|---|---|---|
hive.s3.multipart.min-file-size |
18MB, 20MB |
16MB |
Minimum file size for an S3 multipart upload |
Presto versions 0.208 and 317 |
hive.s3.multipart.min-part-size |
8MB, 9MB |
6MB |
Minimum part size for an S3 multipart upload |
Presto versions 0.208 and 317 |
hive.s3.sse.enabled |
|
|
It is used to configure server-side encryption for
data at rest on S3, by setting it to |
Presto versions 0.208 and 317 |
hive.s3.sse.type |
|
NA |
It is used to specify the type of server-side
encryption when |
Presto versions 0.208 and 317 |
hive.s3.ssl.enabled |
|
|
It is used to secure the
communication between Amazon S3 and
the Presto cluster using SSL. Set the
property to |
Presto versions 0.208 and 317 |
hive.bucket-owner-full-control |
|
|
When it is enabled, the S3 bucket owner gets complete permissions over the files written into it by other users. |
Presto version 0.208 and older |
hive.s3.upload-acl-type |
|
NULL |
When its value is set to
|
Presto version 317 |
hive.s3-secondary-role-arn |
Secondary IAM Role’s ARN |
NA |
Configure the ARN of the user-overridden (secondary) IAM Role if S3 buckets are not accessible from Qubole account’s IAM Role and require a separate IAM Role to access such S3 buckets. You must configure this user-overridden IAM Role. The cluster assumes this user-overridden IAM Role to only access the S3 bucket. |
Presto versions 0.208 and 317 |
hive.s3-secondary-role-extid |
Secondary IAM Role’s External ID |
NA |
Add the external ID of the user-overridden (secondary) IAM Role that is used to access S3 buckets, which are not accessible from Qubole account’s IAM Role and require a separate IAM Role to access them. |
Presto versions 0.208 and 317 |
hive.stale-listing-max-retry-time |
10s, 1m, 10m |
1m |
It is the duration within which File readers continue
retrying if there is query failure with FileNotFound
exception. File readers retry the file open
operation if there is a failure with FileNotFound
exception. This provides a safeguard against the S3
Eventual Consistency issue where the coordinator node
could see the file and created split for it but the
worker node could not read the file with FileNotFound
exception from S3. The session-level property to set
the maximum duration is
|
Presto versions 0.208 and 317 |
Clearing Cache
If the cluster is on Presto 0.208 or a later version, run this command to clear the metastore cache maintained in the coordinator node for a given Hive catalog. You can call the following commands based on the Presto version:
Presto version 0.208:
catalogName.default.clear_cache();
Presto version 317:
catalogName.system.clear_cache();
Qubole has added a procedure for the Hive connector to clear table cache. You can call the following procedure to clear cache corresponding to that table based on the Presto version:
Presto version 0.208:
catalogName.default.clear_table_cache('schema_name','table_name')
Presto version 317:
catalogName.system.clear_table_cache('schema_name','table_name')