The new features and enhancements are:
- Presto Clusters Support Spot Block Nodes Rotation
- Support for Requester Pay Buckets in S3
- Presto Version Changes
- Cost-based Optimization is default in JOIN Reordering and Redistribution
- Session Properties to Set Required Number of Worker Nodes
- Presto Notebook Enhancements
- Disable Reserved Pool
- Disable SSH Access to Presto Clusters
- Clearing the Hive Metastore Cache
- Presto Cluster Accessing S3 Buckets with Different Configuration
- Query Retry Mechanism Support in IOW, CTAS, and SELECT Queries
- Qubole Supports Optimized Local Scheduling
- Limiting the Total Bytes Scanned in a Running Query
- Performance Improvement in Queries Involving IN and NOT IN over a Subquery
Other enhancements and bug fixes are listed in:
Presto Clusters Support Spot Block Nodes Rotation¶
PRES-2041: Previously Qubole supported Spot Block nodes only for fixed duration clusters. Now, AWS Spot Block nodes can be configured as auto-scaling nodes for long running clusters as well. Beta, Via Support
Spot block nodes are 30% to 50% cheaper compared to On-Demand nodes and are more reliable than Spot nodes as they are acquired for a predefined duration (1 to 6 hours). Qubole minimizes query failures by intelligently replacing spot block nodes with new nodes before their expiry. Replacement of Spot Block nodes can be configured depending based on the expected runtime of queries that are run on a cluster. Cluster Restart Required
These are the Presto configuration that you can override on a Presto cluster:
For more information, see the documentation.
Support for Requester Pay Buckets in S3¶
PRES-2771: Qubole has added support Requester Pay Buckets in S3 for Presto. For details, see Requester Pay Buckets.
hive.s3.requester-pays.enabled=true in Hive Catalog properties to enable Requester Pay Buckets on a Presto cluster. Cluster Restart Required
Presto Version Changes¶
PRES-1527: The following are changes associated with Presto versions: Cluster Restart Required
- Presto 0.142 has been removed from cluster AMI, any existing 0.142 version cluster must be upgraded to Presto 0.180 or later versions.
- Presto 0.157 has been hidden from the cluster UI but it continues to be present in the cluster AMI. An existing Presto-0.157 cluster continues to work but you cannot create new Presto-0.157 clusters (you must use a newer Presto version).
- Presto 0.174 has been removed from the cluster AMI and you must upgrade any existing cluster with Presto 0.174 version to Presto 0.180 or later versions.
- Presto 0.180 has been labeled as Deprecated in the cluster UI. While there are no restrictions on usage or creation of Presto-0.180 clusters, Qubole strongly recommends users to upgrade to 0.193 or later versions as a lot of new features are available only on the recent versions.
Cost-based Optimization is default in JOIN Reordering and Redistribution¶
PRES-2372: Cost-based optimization (CBO) for JOIN reordering and JOIN distribution type selection using statistics present in the Hive metastore is enabled by default for Presto version 0.208.
The following values are added to default cluster configuration for Qubole Presto version 0.208.
optimizer.join-reordering-strategy=AUTOMATIC join-distribution-type=AUTOMATIC join-max-broadcast-table-size=100MB
Session Properties to Set Required Number of Worker Nodes¶
PRES-2695: Qubole allows overriding required number of workers feature’s
query-manager.required-workers at query-level
through the corresponding session properties,
Presto Notebook Enhancements¶
PRES-2600: These are the new enhancements in the Presto notebooks:
- You can now set session properties in Presto notebooks in a paragraph and run it. When set, these session properties are applicable to paragraphs in the notebook’s current session.
- In Presto notebooks, for improving debugging experience, the source field is set as
notebook_<notebook-name>_<notebook-id>and in the dashboards, the source field is set as
dashboard_<dashboard-name>_<dashboard-id>_<source-note-id>. A source field is directly searchable in the Presto UI. For example, in the Presto UI, you can search a notebook by its name or ID to quickly filter queries, which are run from that specific notebook while debugging an error.
PRES-2254: In a Presto notebook, you can now set
zeppelin.presto.stacktrace as an interpreter property for displaying
stacktrace for certain errors.
Disable Reserved Pool¶
PRES-2918: A new experimental configuration property called
experimental.reserved-pool-enabled is added to Presto version
0.208 to allow disabling Reserved Pool, which is used to prevent deadlocks when memory is exhausted in the General Pool by
promoting the biggest query to Reserved Pool. However, only one query gets promoted to Reserved Pool and queries in General
Pool get into the blocked state whenever it becomes full. To avoid this scenario, you can set
false for disabling Reserved Pool. For more information,
see Disabling Reserved Pool.
PRES-2657: The path for spill to disk functionality, which is
has been configured by default on Qubole Presto 0.208. This allows users to easily use spill to disk by either setting
set session spill_enabled=true for individual queries or adding
experimental.spill-enabled=true to Presto cluster
configuration override to enable spill to disk for all queries.
Disable SSH Access to Presto Clusters¶
ACM-5449: Presto clusters can now work without connecting through SSH into the cluster. Via Support
The following conditions must be true to support this configuration: Cluster Restart Required
- The cluster must be in a private subnet.
- The metastore must be directly accessible from the cluster. This configuration does not work with Qubole-managed Hive metastore.
Contact Qubole Support to disable the SSH access to the cluster. If SSH is disabled, the SSH port 22 must be opened by the user to allow Qubole Support to debug issues when you seek support. (PRES-2742)
Clearing the Hive Metastore Cache¶
PRES-111: Qubole has added a command,
catalog.schema.clear_cache to clear the Hive metastore cache on the
coordinator node for a given Hive catalog. The command is supported only on Presto version 0.208.
Presto Cluster Accessing S3 Buckets with Different Configuration¶
PRES-406: Qubole supports choosing bucket configuration using which Presto can pick the right configuration for the S3 bucket that a query is accessing. This is useful when the same cluster needs to access S3 buckets with different configurations.
Query Retry Mechanism Support in IOW, CTAS, and SELECT Queries¶
PRES-2584: Presto smart query retry is supported for Insert Overwrite Table (IOW) and Create Table As Select (CTAS) queries now.
PRES-2585: Presto smart query retry is supported in SELECT queries which did not return any data before failing.
Qubole Supports Optimized Local Scheduling¶
Limiting the Total Bytes Scanned in a Running Query¶
PRES-2744: Qubole has added a session property,
qubole_max_raw_input_datasize=1TB to limit the total bytes
scanned. Queries that exceed this limit fail with the
RAW_INPUT_DATASIZE_READ_LIMIT_EXCEEDED exception. This ensures
rogue queries do not run for a very long time.
Performance Improvement in Queries Involving IN and NOT IN over a Subquery¶
PRES-2790: Qubole has done performance improvement in queries that involve IN and NOT IN over a subquery. Read more in this blog post.
- JDBC-124: Qubole now supports concurrency of multiple statements in Presto FastPath.
- PRES-2182: Qubole has added health check for Presto clusters. Qubole would automatically terminate unhealthy clusters (that is when the Presto server is not reachable on coordinator and worker nodes).
- PRES-2256: Qubole supports Presto decimal coercion on Presto version 0.208.
- PRES-2438: The error about Query that has failed due to a Spot loss is displayed in the exception stack trace now. Query failures due to Spot node interruptions are detected faster now using information about the Spot interruption time from the Cloud Provider.
- PRES-2447: Qubole has upgraded the Datadog agent version from v5 to v6.
- PRES-2510: Clicking the Presto UI from the Qubole Control Plane in the homepage redirects to <base-url>/presto-ui-<cluster-id>/ui/. It also redirects <master>:dns:8081 to a static resource <base-url>/ui/index.html. It is useful when Presto is used bypassing the QDS Control Plane.
- PRES-2584: Presto smart query retry is supported for Insert Overwrite Table (IOW) and Create Table As Select (CTAS) queries now.
- PRES-2585: Presto smart query retry is supported in SELECT queries which did not return any data before failing.
- PRES-2667: Whenever a node encounters Spot loss, the query info of all queries running on that node is updated
with information about the spot node interruption. The query may or may not fail due to the spot interruption. The
affected query’s query info displays the warning:
Query may fail due to interruption of spot instance: <Instance ID> with private IP: <IP> by cloud provider at: <Spot Interruption Time> GMT.
ExitOnOutOfMemoryErrorJVM configuration properties are pulled from the default JVM configuration of Qubole Presto 0.180 and later versions to improve stability. the open source into For more information, see jvm.config.
- PRES-2791: Qubole has ported open-source changes that are related to improvements in S3 reads to Qubole Presto 0.208 version. For more information, see Faster S3 reads.
- PRES-2861: QDS skips requesting instances of the families for which spot losses were seen at the cluster level within a specified time window (default duration is last 15 minutes). If spot losses were seen for all configured instance families, QDS tries to provision instances synchronously, and finally falls back to On-Demand if configured in case of unavailability of spot nodes. Qubole recommends configuring instances of multiple families. Via Support
- PRES-2892: In case, if a query violates the Presto strict mode conditions and if the Presto strict mode is not enabled, then Qubole displays warnings in the specific query’s query info.
- PRES-2992: Qubole has added
presto-thriftconnectors to Presto 0.193 and 0.208 versions.
PRES-1480: The issue where there was incorrect parsing due to
\t(tab) characters in the result data has been resolved now. Qubole supports
\t(tab) characters in the UI by enhancing parsing. Via Support
PRES-2391: Fixed issues where dynamic filters were missed when its associated JOIN process finished quickly has been resolved.
PRES-2568: The issue in which carriage return
\rwas incorrectly added wherever there was a semicolon in a query has been resolved.
PRES-2645: The file open operation that returned FileNotFound exception has been resolved. File readers now retry the file open operation if there is a failure with FileNotFound exception. This provides a safeguard against the S3 Eventual Consistency issue where the master node could see the file and created split for it but the worker node could not read the file with FileNotFound exception from S3.
The cluster-level timeout configuration is
hive.stale-listing-max-retry-timewith a default of 1 minute and the session-level configuration is
hive.stale_listing_max_retry_time. Cluster Restart Required for the cluster-level configuration.
Qubole supports this timeout configuration in Presto 0.193 and later versions.
PRES-2663: The S3 location where Presto server logs are uploaded has been changed now (for Presto master). The changes in Presto server log locations are:
PRES-2810: The failures in query planning with dynamic filtering enabled are resolved.
ZEP-3808: The issue where canceling notebook paragraphs hid them in dashboards has been resolved. On canceling the paragraph, the notebook displays an appropriate error message. As error messages are not empty, paragraphs are visible in the dashboards’ UI.
For a list of bug fixes between versions R56 and R57, see Changelog for api.qubole.com.