Presto

The new features and key enhancements are:

Presto 317 (beta) with New Features

PRES-3070: Presto 317 (Beta) is the latest version that Qubole Presto supports. This version of Qubole Presto supports open-source changes, reading Hive ACID tables, and other changes including: Beta | Cluster Restart Required

  • PRES-3139: Presto 317 now supports required-workers.

  • PRES-3066, PRES-3106: Presto 317 now supports Dynamic Filtering.

  • PRES-2967: Presto 317 now supports Qubole’s workload aware autoscaling.

  • PRES-2966: Presto 317 now supports strict mode.

  • PRES-2965: Presto 317 now supports integration with Apache Ranger.

  • PRES-2969: Presto 317 now supports smart query retries.

  • PRES-3141: Presto 317 now supports Kinesis connector.

  • PRES-2963: Presto 317 now supports Rubix cache.

Presto Supports Hive ACID Tables

PRES-2839: Qubole Presto 317 (Beta) supports reading Hive ACID tables. It now has read support for:

  • Insert-only ACID table

  • Full ACID table

  • Non-ACID table converted to ACID table

JOIN Reordering and JOIN Type Determination Based on Table Size

PRES-2971: Table-size-based stats for determining JOIN distribution type and JOIN reordering now also work with predicates on partitioned tables. The size is calculated only for partitions that are being queried.

The distribution type of JOINs in a query is also visible in the Presto query info under the joinDistributionStats key name.

Presto Version 0.193 Deprecated

Presto 0.193 is deprecated and is labelled as deprecated on the Clusters page of the QDS UI. You can still create and use Presto 0.193 clusters, but Qubole strongly recommends you upgrade to 0.208 or a later version to take advantage of the many new features.

Presto 0.208 is the new default version.

Proactive Removal of Unhealthy Cluster Nodes

QDS has implemented the following changes: Cluster Restart Required

  • PRES-2093: Use ascm.bad-node-removal to enable or disable this service, which when enabled finds and removes unhealthy worker nodes periodically. Configure the periodic interval via ascm.bad-node-removal.interval. Disabled | Cluster Restart Required

  • PRES-3044: The coordinator node periodically fetches open file descriptor counts from the worker nodes and gracefully shuts down nodes whose open file descriptor count exceeds a threshold.

For more information, see the documentation.

Buffer Capacity in Presto Clusters

PRES-2682: Presto clusters now support configuring buffer capacity. Set ascm.cluster-start-buffer-workers to configure the buffer capacity. Disabled | Cluster Restart Required

This configured buffer capacity will remain free unless the cluster reaches or exceeds its configured maximum size. Note that when this feature is enabled, the cluster uses buffer capacity as the trigger to upscale, as opposed to the triggers described in workload-aware Presto autoscaling.

For more information, see the documentation.

Dynamic Filtering Improvements

PRES-3152 introduces these improvements:

  • Improves efficiency of dynamic partition pruning by preventing listing and creation of Hive splits from partitions, which are pruned at runtime. (PRES-2990)

  • Enables dynamic partition pruning on Hive tables at the account level. (PRES-3112) Gradual Rollout | Cluster Restart Required

  • Resolves the invalid partition value exception, and intermittent ArrayIndexOutOfBoundsException exceptions from queries with Dynamic Filtering enabled. (PRES-3051)

  • Fixes the UnsupportedOperationException that occurred with some complex outer join queries when dynamic filtering was enabled. (PRES-3249)

Other Enhancements

  • PRES-2740: The Presto Server now runs as a Presto user rather than the root user.

  • PRES-3174: Provides an account-level setting to enable or disable per-user-based filesystem object caching. Gradual Rollout | Cluster Restart Required

  • PRES-3202: The reserved memory pool is disabled by default in Presto version 317. For Presto 0.208, the reserved memory pool is being disabled as part of a Gradual Rollout | Cluster Restart Required.

  • PRES-3076: Presto has added support for Prometheus monitoring with the default Presto dashboard, capturing various JMX metrics, which you can view through Grafana. The link is in the Resources drop-down list on the Clusters page of the QDS UI . Gradual Rollout | Cluster Restart Required

For more information, see the documentation.

Bug Fixes

  • PRES-3020: Qubole has back-ported an open-source fix to Presto 0.208 to make Presto use an ordinal position instead of field names for mapping struct types in ORC file format.

  • PRES-3177: To prevent the Presto Server from starting without applying bootstrap changes, the server will now not start if its bootstrap file fails to download.

  • PRES-3282: Adds support for lambda expressions in ExpressionEquivalence.