Configuring a Presto Cluster¶
A single Qubole account can run multiple clusters. By default, Qubole provides a Presto cluster, along with Hadoop and Spark clusters, for each account.
Presto is not currently supported on all Cloud platforms; see QDS Components: Supported Versions and Cloud Platforms.
The following topics explain Presto custom configuration and the presto catalog properties:
- Understanding the Presto Engine Configuration that describes:
- Using the Catalog Configuration
QDS provides the Presto Ruby client for better overall performance, such as processing DDL queries much faster and quickly reporting errors that a Presto cluster generates. This client is available for Beta access; create a ticket with Qubole Support to enable it for your QDS account. For more information, see this blog.
To view or edit a Presto cluster’s configuration, navigate to the Clusters page and select the cluster with the label presto.
Click the edit icon in the Action column against a Presto cluster to edit the configuration.
Presto queries are memory-intensive. Choose instance types with ample memory for both the master and worker nodes.
You can select the Presto Version on the cluster configuration page. These are the supported versions:
- 0.142 is the deprecated version. It is unavailable for spawning new clusters or version change. Although existing clusters remain to work until the configuration is changed.
- 0.157 is marked as the deprecated version. It is not recommended to use this version as Qubole plans to shortly stop supporting this version. Existing clusters continue to work with this version.
- 0.180 is a stable version.
- 0.193 is the default and stable version.
- 0.208 is the latest stable version.
See QDS Components: Supported Versions and Cloud Platforms for the latest version information for your platform.
Qubole can automatically terminate a Presto cluster with an invalid configuration. This capability is available for Beta access; Create a ticket with Qubole Support to enable it for your account.
Check the logs in
/usr/lib/presto/logs/server.log if there is a cluster failure or configuration error. See
Presto FAQs for more information about Presto logs.
The following figure shows Hadoop and Presto configuration override for a Presto cluster.
On AWS or Azure, select Enable Rubix to enable RubiX. See Configuring RubiX in Presto and Spark Clusters for more information.
See Managing Clusters for more information on cluster configuration options that are common to all cluster types.
About Presto System Monitoring¶
Understanding the Presto Metrics for Monitoring describes the list of metrics that can be seen on the Datadog monitoring service. It also describes the abnormalities and actions that you can perform to handle abnormalities.
Avoiding Stale Caches¶
The cache parameters are useful to tweak if you expect data to change rapidly.
Fo example, if a Hive table adds a new partition, it may take Presto 20 minutes to discover it. If you plan on changing existing files in the Cloud, you may want to make fileinfo expiration more aggressive. If you expect new files to land in a partition rapidly, you may want to reduce or disable the dirinfo cache.