Configuring a Presto Cluster¶
A single Qubole account can run multiple clusters. By default, Qubole provides a Presto cluster, along with Hadoop and Spark clusters, for each account.
Presto is not currently supported on all Cloud platforms; see QDS Components: Supported Versions and Cloud Platforms.
The following topics explain Presto custom configuration and the presto catalog properties:
- Understanding the Presto Engine Configuration that describes:
- Using the Catalog Configuration
QDS provides the Presto Ruby client for better overall performance, processing DDL queries much faster and quickly reporting errors that a Presto cluster generates. For more information, see this blog.
To view or edit a Presto cluster’s configuration, navigate to the Clusters page and select the cluster with the label presto.
Click the edit icon in the Action column against a Presto cluster to edit the configuration.
Presto queries are memory-intensive. Choose instance types with ample memory for both the master and worker nodes.
You can select the Presto Version on the cluster configuration page. These are the supported versions:
- 0.193 is the deprecated version. While there are no restrictions on usage or creation of Presto-0.193 clusters, Qubole strongly recommends users to upgrade to 0.208 or later versions as a lot of new features are not ported back to deprecated versions.
- 0.208 is the default and stable version.
- 317 (beta) is the latest beta version.
See QDS Components: Supported Versions and Cloud Platforms for the latest version information.
Qubole can automatically terminate a Presto cluster with an invalid configuration. This capability is available for Beta access; Create a ticket with Qubole Support to enable it for your account.
Check the logs in
/usr/lib/presto/logs/server.log if there is a cluster failure or configuration error. See
Presto FAQs for more information about Presto logs.
The following figure shows Hadoop and Presto configuration override for a Presto cluster.
On AWS or Azure, select Enable Rubix to enable RubiX. See Configuring RubiX in Presto and Spark Clusters for more information.
See Managing Clusters for more information on cluster configuration options that are common to all cluster types.
About Presto System Monitoring¶
Understanding the Presto Metrics for Monitoring describes the list of metrics that can be seen on the Datadog monitoring service. It also describes the abnormalities and actions that you can perform to handle abnormalities.
Avoiding Stale Caches¶
The cache parameters are useful to tweak if you expect data to change rapidly.
Fo example, if a Hive table adds a new partition, it may take Presto 20 minutes to discover it. If you plan on changing existing files in the Cloud, you may want to make fileinfo expiration more aggressive. If you expect new files to land in a partition rapidly, you may want to reduce or disable the dirinfo cache.