Using the Catalog Configuration

A Presto catalog consists of schemas and refers to a data source through a connector. Qubole allows you to add the catalog through a simplified way by just defining its properties through the Presto overrides on the Presto cluster. You can add the catalog using the syntax below through the Presto override.

catalog/<catalog-name>.properties:
<catalog property 1>
<catalog property 2>
.
.
.
<catalog property n>

catalog/hive.properties

Qubole provides table-level security for Hive tables accessed through Presto. See Understanding Qubole Hive Authorization for more information.

If the cluster is on Presto version 0.208, run this command to clear the metastore cache maintained in the coordinator node for a given Hive catalog.

call catalog.schema.clear_cache();

The following table describes the common Hive catalog properties.

Parameter Examples Default Description
hive.metastore-timeout 3m, 1h 3m Timeout for Hive metastore calls that is, it denotes how long a request waits to fetch data from the metastore before getting timed out.
hive.metastore-cache-ttl 5m, 20m 20m It denotes a data entry’s life duration in the metastore cache before it is evicted. Metastore caches tables, partitions, databases, and so on that are fetched from the Hive metastore. Configuring Thrift Metastore Server Interface for the Custom Metastore (AWS) describes how to configure Hive Thrift Metastore Interface.
hive.metastore-cache-ttl-bulk 20m, 1d NA When you have a query that you need to run on hive.information_schema.columns, set this option as a Presto override. For example, hive.metastore-cache-ttl-bulk=24h. Enabling this option caches table entries for the configured duration, when the table info is fetched (in bulk) from the metastore. This makes fetching tables/columns through JBDC drivers faster.
hive.metastore-refresh-interval 10m, 20m 100m

It denotes the time after which a background refresh for an entry in the metastore cache is triggered. If you still see stale results, then you can see fresh results when you run the query for the second time. Suppose, if you do not set this parameter or when its value is greater than hive.metastore-cache-ttl and you run the query after the entry is evicted from the metastore cache, then the query brings back the evicted entry from the Hive metastore into the cache and pays this warmup time. Retrieving the info from the metastore takes more time than reading from the cache.

To avoid such a scenario, If you set this parameter and if the query is run after the refresh interval has expired, then the query returns the cache entry quickly and starts a background cache refresh. So, it is useful to set the value of hive.metastore-cache-ttl higher than hive.metastore-refresh-interval to get the cached entries with higher TTL and faster refreshes.

hive.security allow-all, sql-standard allow-all sql-standard enables Hive authorization. See Understanding Qubole Hive Authorization for more information.
hive.skip-corrupt-records true, false false

It is used to skip corrupt records in input formats other than orc, parquet and rcfile. You can also set it as a session property, as hive.skip_corrupt_records=true in a session when the active cluster does not have this configuration globally enabled. This configuration is supported only in Presto 0.180 and later versions.

Note

The behavior for the corrupted file is non-deterministic that is Presto might read some part of the file before hitting corrupt data and in such a case, the QDS record reader returns whatever it read until this point and skips the rest of the file.

hive.information-schema-presto-view-only true, false true It is enabled by default and hence, the information schema only includes the Presto views and not the Hive views. When it is set to false, the information schema includes both the Presto and Hive views.

Hive Catalog Properties associated with AWS

These catalog properties are associated with AWS.

Parameter Examples Default Description
hive.s3.multipart.min-file-size 18MB, 20MB 16MB Minimum file size for an S3 multipart upload
hive.s3.multipart.min-part-size 8MB, 9MB 6MB Minimum part size for an S3 multipart upload
hive.s3.sse.enabled true, false false It is used to configure server-side encryption for data at rest on S3, by setting it to true. For more information, see Enabling SSE-KMS in the Presto Cluster.
hive.s3.sse.type KMS, S3 NA It is used to specify the type of server-side encryption when hive.s3.sse.enabled is set to true. This property is only supported on Presto 0.180 and 0.193 versions.
hive.s3.ssl.enabled true, false false It is used to secure the communication between Amazon S3 and the Presto cluster using SSL. Set the property to true to enable it.
hive.bucket-owner-full-control true, false false When it is enabled, the S3 bucket owner gets complete permissions over the files written into it by other users.
hive.s3-secondary-role-arn Secondary IAM Role’s ARN NA If Dual IAM Roles are configured on the Qubole account, then add the ARN of the secondary IAM Role that is inaccessible by Qubole to access S3 buckets using this property. For more information, see Creating Dual IAM Roles for your Account.
hive.s3-secondary-role-extid Secondary IAM Role’s External ID NA If Dual IAM Roles are configured on the Qubole account, then add the external ID of the secondary IAM Role that is inaccessible by Qubole to access S3 buckets using this property. For more information, see Creating Dual IAM Roles for your Account.
hive.stale-listing-max-retry-time 10s, 1m, 10m 1m It is the duration within which File readers continue retrying if there is query failure with FileNotFound exception. File readers retry the file open operation if there is a failure with FileNotFound exception. This provides a safeguard against the S3 Eventual Consistency issue where the master node could see the file and created split for it but the worker node could not read the file with FileNotFound exception from S3. The session-level property to set the maximum duration is hive.stale_listing_max_retry_time. The default value is 1 minute. It is supported in Presto 0.193 and later versions.