Presto

The new features and key enhancements are:

Other enhancements and bug fixes are listed in:

Presto 317 (beta) with New Features

PRES-3070: Presto 317 (beta) is the latest version that Qubole Presto supports. It supports open-source changes, Qubole Presto features, and reading Hive ACID tables. Other significant changes include: Beta | Cluster Restart Required

  • PRES-3140: Presto 317 now supports file-based authentication.
  • PRES-2950: Presto 317 now runs on Java version 11.
  • PRES-3139: Presto 317 now supports required-workers.
  • PRES-3134: Presto 317 now supports Hive Views.
  • PRES-3066, PRES-3106: Presto 317 now supports Dynamic Filtering.
  • PRES-2967: Presto 317 now supports Qubole’s workload aware autoscaling.
  • PRES-2966: Presto 317 now supports strict mode.
  • PRES-2965: Presto 317 now supports integration with Apache Ranger.
  • PRES-3329: Presto 317 now supports smart query retries.
  • PRES-3141: Presto 317 now supports Kinesis connector.
  • PRES-2963: Presto 317 now supports Rubix cache.
  • PRES-3234: Presto 317 (beta) now supports AWS Glue metastore.

Presto Supports Hive ACID Tables

PRES-2839: Qubole Presto 317 (beta) supports reading Hive ACID tables. It now has read support for:

  • Insert-only ACID table
  • Full ACID table
  • Non-ACID table converted to ACID table

For more information, see Using ACID Tables in Presto.

JOIN Reordering and JOIN Type Determination Based on Table Size

PRES-2971: Table Size-based stats for determining JOIN distribution type and JOIN reordering, now also works with predicates on partitioned tables. The size is calculated only for partitions that are being queried.

Further, the distribution type of JOINS in a query is also visible in the Presto query info under the joinDistributionStats key name.

Monitoring Presto using Prometheus

PRES-3285: Presto has added support for Prometheus monitoring tool with the default Presto dashboard capturing various JMX metrics, which you can view through Grafana. The link is accessible under the Resources drop-down list on the Clusters UI page. Gradual Rollout | Cluster Restart Required

For more information, see the documentation.

Deprecating Presto Version 0.193

Presto 0.193 is deprecated and it is labelled as deprecated on the Clusters UI. While there are no restrictions on usage or creation of Presto-0.193 clusters, Qubole strongly recommends users to upgrade to 0.208 or later versions as a lot of new features are available only on the recent versions. Presto 0.208 is the new default version now.

Proactive Removal of Unhealthy Cluster Nodes

To maintain cluster health, Qubole has added these changes to proactively remove unhealthy cluster nodes: Cluster Restart Required

  • PRES-2093: Use ascm.bad-node-removal to enable/disable this service, which when enabled finds and removes unhealthy worker nodes periodically. The periodic interval is configured using ascm.bad-node-removal.interval. Disabled | Cluster Restart Required
  • PRES-3044: The master node periodically fetches open file descriptor counts from the worker nodes and forcefully quiesce nodes whose open file descriptor count exceeds a threshold.

For more information, see the documentation.

Buffer Capacity in Presto Clusters

PRES-2682: Presto clusters now support additional configuration for maintaining buffer capacity. Set ascm.cluster-start-buffer-workers to a required value (count) to configure the buffer capacity. Disabled | Cluster Restart Required

The cluster always has this buffer (configured) capacity free throughout its lifetime except when the cluster size exceeds or reaches its configured maximum cluster size. Note that with this feature enabled, the cluster now upscales using buffer capacity as the trigger to upscale as opposed to triggers described in workload-aware Presto autoscaling.

For more information, see the documentation.

Dynamic Filtering Improvements

PRES-3152: Improvements in Dynamic Filtering are:

  • Improved efficiency of dynamic partition pruning by preventing listing and creation of Hive splits from partitions, which are pruned at runtime. (PRES-2990)
  • Qubole has introduced a feature to enable dynamic partition pruning on Hive tables at account level. (PRES-3112) Gradual Rollout | Cluster Restart Required
  • The invalid partition value exception and intermittent ArrayIndexOutOfBoundsException exceptions from queries with Dynamic Filtering enabled, are resolved. (PRES-3051)
  • Fixed UnsupportedOperationException encountered with some complex outer join queries when dynamic filtering is enabled. (PRES-3249)

Presto Query Details on Workbench Status Pane

PRES-2528: The status pane of Workbench UI now shows spot loss, warnings, and retry information for a running Presto query.

Enhancements

  • PRES-2740: Presto Server runs as a Presto user as opposed to a root user.
  • PRES-2944: Qubole has upgraded AWS SDK that Presto 0.208 uses to 1.11.602. It has several performance improvements as described in this open-source PR. Presto 317 (beta) has the upgraded AWS SDK version by default.
  • PRES-3174: Use the account feature to enable/disable per user based File System object caching. Gradual Rollout | Cluster Restart Required
  • PRES-3127: Qubole has improved Spot Rebalancer to handle cluster composition where all minimum nodes are spot nodes or all nodes (including the master) are spot nodes (that is a Spot-only cluster).
  • PRES-3202: Reserved pool is now disabled by default in Presto version 317. Disabling reserved memory pool in Presto 0.208 is part of Gradual Rollout | Cluster Restart Required.

Bug Fixes

  • PRES-2985: Fixed the error message in cases wherein two CREATE TABLE AS SELECT queries creating the same table in parallel causes failure in one of them (that tries to create the table after it has been already created by the other). New change shows errors clearly and includes information on existing table and files, which caused the query failure.
  • PRES-3020: Qubole has back ported open-source fix to Presto 0.208 to make Presto use an ordinal position instead of field names for mapping struct types in ORC file format.
  • PRES-3109: Fixed the logic to escape \r and \n characters so that the original encoding scheme of data is not altered and the output data gets written in the same encoding scheme.
  • PRES-3177: Presto Server start now fails if the Presto Server Bootstrap file fails to download. This avoids situations wherein the Presto Server starts without applying bootstrap changes.

For a list of bug fixes between versions R57 and R58, see Changelog for api.qubole.com.