Presto

The new features and enhancements are:

Other enhancements and bug fixes are listed in:

Presto Clusters Support Spot Block Nodes Rotation

PRES-2041: Previously Qubole supported Spot Block nodes only for fixed duration clusters. Now, AWS Spot Block nodes can be configured as auto-scaling nodes for long running clusters as well. Beta, Via Support

Spot block nodes are 30% to 50% cheaper compared to On-Demand nodes and are more reliable than Spot nodes as they are acquired for a predefined duration (1 to 6 hours). Qubole minimizes query failures by intelligently replacing spot block nodes with new nodes before their expiry. Replacement of Spot Block nodes can be configured depending based on the expected runtime of queries that are run on a cluster. Cluster Restart Required

These are the Presto configuration that you can override on a Presto cluster:

  • ascm.node-expiry-period (default=15m)
  • ascm.node-recycle-period (default=15m)

For more information, see the documentation.

Support for Requester Pay Buckets in S3

PRES-2771: Qubole has added support Requester Pay Buckets in S3 for Presto. For details, see Requester Pay Buckets.

Set hive.s3.requester-pays.enabled=true in Hive Catalog properties to enable Requester Pay Buckets on a Presto cluster. Cluster Restart Required

Presto Version Changes

PRES-1527: The following are changes associated with Presto versions: Cluster Restart Required

  • Presto 0.142 has been removed from cluster AMI, any existing 0.142 version cluster must be upgraded to Presto 0.180 or later versions.
  • Presto 0.157 has been hidden from the cluster UI but it continues to be present in the cluster AMI. An existing Presto-0.157 cluster continues to work but you cannot create new Presto-0.157 clusters (you must use a newer Presto version).
  • Presto 0.174 has been removed from the cluster AMI and you must upgrade any existing cluster with Presto 0.174 version to Presto 0.180 or later versions.
  • Presto 0.180 has been labeled as Deprecated in the cluster UI. While there are no restrictions on usage or creation of Presto-0.180 clusters, Qubole strongly recommends users to upgrade to 0.193 or later versions as a lot of new features are available only on the recent versions.

Cost-based Optimization is default in JOIN Reordering and Redistribution

PRES-2372: Cost-based optimization (CBO) for JOIN reordering and JOIN distribution type selection using statistics present in the Hive metastore is enabled by default for Presto version 0.208.

The following values are added to default cluster configuration for Qubole Presto version 0.208.

optimizer.join-reordering-strategy=AUTOMATIC
join-distribution-type=AUTOMATIC
join-max-broadcast-table-size=100MB

Session Properties to Set Required Number of Worker Nodes

PRES-2695: Qubole allows overriding required number of workers feature’s cluster-level properties, query-manager.required-workers-max-wait and query-manager.required-workers at query-level through the corresponding session properties, required_workers_max_wait and required_workers.

Presto Notebook Enhancements

PRES-2600: These are the new enhancements in the Presto notebooks:

  • You can now set session properties in Presto notebooks in a paragraph and run it. When set, these session properties are applicable to paragraphs in the notebook’s current session.
  • In Presto notebooks, for improving debugging experience, the source field is set as notebook_<notebook-name>_<notebook-id> and in the dashboards, the source field is set as dashboard_<dashboard-name>_<dashboard-id>_<source-note-id>. A source field is directly searchable in the Presto UI. For example, in the Presto UI, you can search a notebook by its name or ID to quickly filter queries, which are run from that specific notebook while debugging an error.

PRES-2254: In a Presto notebook, you can now set zeppelin.presto.stacktrace as an interpreter property for displaying stacktrace for certain errors.

Disable Reserved Pool

PRES-2918: A new experimental configuration property called experimental.reserved-pool-enabled is added to Presto version 0.208 to allow disabling Reserved Pool, which is used to prevent deadlocks when memory is exhausted in the General Pool by promoting the biggest query to Reserved Pool. However, only one query gets promoted to Reserved Pool and queries in General Pool get into the blocked state whenever it becomes full. To avoid this scenario, you can set experimental.reserved-pool-enabled to false for disabling Reserved Pool. For more information, see Disabling Reserved Pool.

PRES-2657: The path for spill to disk functionality, which is experimental.spiller-spill-path=/media/ephemeral0/presto/spill_dir has been configured by default on Qubole Presto 0.208. This allows users to easily use spill to disk by either setting set session spill_enabled=true for individual queries or adding experimental.spill-enabled=true to Presto cluster configuration override to enable spill to disk for all queries.

Disable SSH Access to Presto Clusters

ACM-5449: Presto clusters can now work without connecting through SSH into the cluster. Via Support

The following conditions must be true to support this configuration: Cluster Restart Required

  • The cluster must be in a private subnet.
  • The metastore must be directly accessible from the cluster. This configuration does not work with Qubole-managed Hive metastore.

Contact Qubole Support to disable the SSH access to the cluster. If SSH is disabled, the SSH port 22 must be opened by the user to allow Qubole Support to debug issues when you seek support. (PRES-2742)

Clearing the Hive Metastore Cache

PRES-111: Qubole has added a command, catalog.schema.clear_cache to clear the Hive metastore cache on the coordinator node for a given Hive catalog. The command is supported only on Presto version 0.208.

Presto Cluster Accessing S3 Buckets with Different Configuration

PRES-406: Qubole supports choosing bucket configuration using which Presto can pick the right configuration for the S3 bucket that a query is accessing. This is useful when the same cluster needs to access S3 buckets with different configurations.

Query Retry Mechanism Support in IOW, CTAS, and SELECT Queries

PRES-2584: Presto smart query retry is supported for Insert Overwrite Table (IOW) and Create Table As Select (CTAS) queries now.

PRES-2585: Presto smart query retry is supported in SELECT queries which did not return any data before failing.

Qubole Supports Optimized Local Scheduling

PRES-2605: Qubole developed a new scheduler to optimally schedule tasks based on locality of data cached with RubiX. Read more in this blog post. Gradual Rollout

Limiting the Total Bytes Scanned in a Running Query

PRES-2744: Qubole has added a session property, qubole_max_raw_input_datasize=1TB to limit the total bytes scanned. Queries that exceed this limit fail with the RAW_INPUT_DATASIZE_READ_LIMIT_EXCEEDED exception. This ensures rogue queries do not run for a very long time.

Performance Improvement in Queries Involving IN and NOT IN over a Subquery

PRES-2790: Qubole has done performance improvement in queries that involve IN and NOT IN over a subquery. Read more in this blog post.

Enhancements

  • JDBC-124: Qubole now supports concurrency of multiple statements in Presto FastPath.
  • PRES-2182: Qubole has added health check for Presto clusters. Qubole would automatically terminate unhealthy clusters (that is when the Presto server is not reachable on coordinator and worker nodes).
  • PRES-2256: Qubole supports Presto decimal coercion on Presto version 0.208.
  • PRES-2438: The error about Query that has failed due to a Spot loss is displayed in the exception stack trace now. Query failures due to Spot node interruptions are detected faster now using information about the Spot interruption time from the Cloud Provider.
  • PRES-2447: Qubole has upgraded the Datadog agent version from v5 to v6.
  • PRES-2510: Clicking the Presto UI from the Qubole Control Plane in the homepage redirects to <base-url>/presto-ui-<cluster-id>/ui/. It also redirects <master>:dns:8081 to a static resource <base-url>/ui/index.html. It is useful when Presto is used bypassing the QDS Control Plane.
  • PRES-2584: Presto smart query retry is supported for Insert Overwrite Table (IOW) and Create Table As Select (CTAS) queries now.
  • PRES-2585: Presto smart query retry is supported in SELECT queries which did not return any data before failing.
  • PRES-2667: Whenever a node encounters Spot loss, the query info of all queries running on that node is updated with information about the spot node interruption. The query may or may not fail due to the spot interruption. The affected query’s query info displays the warning: Query may fail due to interruption of spot instance: <Instance ID> with private IP: <IP> by cloud provider at: <Spot Interruption Time> GMT.
  • PRES-2769: jdk.nio.maxCachedBufferSize and ExitOnOutOfMemoryError JVM configuration properties are pulled from the default JVM configuration of Qubole Presto 0.180 and later versions to improve stability. the open source into For more information, see jvm.config.
  • PRES-2791: Qubole has ported open-source changes that are related to improvements in S3 reads to Qubole Presto 0.208 version. For more information, see Faster S3 reads.
  • PRES-2861: QDS skips requesting instances of the families for which spot losses were seen at the cluster level within a specified time window (default duration is last 15 minutes). If spot losses were seen for all configured instance families, QDS tries to provision instances synchronously, and finally falls back to On-Demand if configured in case of unavailability of spot nodes. Qubole recommends configuring instances of multiple families. Via Support
  • PRES-2892: In case, if a query violates the Presto strict mode conditions and if the Presto strict mode is not enabled, then Qubole displays warnings in the specific query’s query info.
  • PRES-2992: Qubole has added presto-tpcds, presto-localfile, and presto-thrift connectors to Presto 0.193 and 0.208 versions.

Bug Fixes

  • PRES-1480: The issue where there was incorrect parsing due to \t (tab) characters in the result data has been resolved now. Qubole supports \t (tab) characters in the UI by enhancing parsing. Via Support

  • PRES-2391: Fixed issues where dynamic filters were missed when its associated JOIN process finished quickly has been resolved.

  • PRES-2568: The issue in which carriage return \r was incorrectly added wherever there was a semicolon in a query has been resolved.

  • PRES-2645: The file open operation that returned FileNotFound exception has been resolved. File readers now retry the file open operation if there is a failure with FileNotFound exception. This provides a safeguard against the S3 Eventual Consistency issue where the master node could see the file and created split for it but the worker node could not read the file with FileNotFound exception from S3.

    The cluster-level timeout configuration is hive.stale-listing-max-retry-time with a default of 1 minute and the session-level configuration is hive.stale_listing_max_retry_time. Cluster Restart Required for the cluster-level configuration.

    Qubole supports this timeout configuration in Presto 0.193 and later versions.

  • PRES-2663: The S3 location where Presto server logs are uploaded has been changed now (for Presto master). The changes in Presto server log locations are:

    • defloc/logs/presto/cluster_inst_id/master/ changed to defloc/logs/presto/cluster_id/cluster_start_time/master/
    • defloc/logs/presto/cluster_inst_id/nodeIP/ changed to defloc/logs/presto/cluster_id/cluster_start_time/nodeIP/node_start_time/
  • PRES-2810: The failures in query planning with dynamic filtering enabled are resolved.

  • ZEP-3808: The issue where canceling notebook paragraphs hid them in dashboards has been resolved. On canceling the paragraph, the notebook displays an appropriate error message. As error messages are not empty, paragraphs are visible in the dashboards’ UI.

For a list of bug fixes between versions R56 and R57, see Changelog for api.qubole.com.