Presto

The new features and key enhancements are:

Other enhancements and bug fixes are listed in:

Presto 317 is Generally Available

PRES-3429: Presto version 317 is generally available now. Cluster Restart Required

BigQuery Connector for Presto

PRES-3153: The BigQuery connector is now available on Presto version 317.

Dynamic Concurrency and Hybrid Autoscaling

PRES-3373: This version marks the first release of Presto’s changes to workload-aware autoscaling. Specifically, these changes target using dynamic concurrency and queue-aware autoscaling in conjunction with CPU-based autoscaling. Multiple automated workload management and operational governance capabilities have been introduced to evolve the Presto service on Qubole for improved performance, reliability, and TCO reduction. Gradual Rollout | Beta | Cluster Restart Required

Enhancements in Presto for JDBC and ODBC Drivers

Enhancements in Presto for next generation (v3) JDBC and ODBC drivers that are designed to make these drivers:

  • As fast as open source drivers
  • Support cluster lifecycle management (auto start cluster when a query is submitted and auto terminate idle clusters)
  • Provide query history available in Analyze and Workbench UI
  • Provide enhanced security (HTTPs) and user authentication (through API token)

Improvements in Dynamic Filtering

PRES-3288: These are the dynamic filtering (DF) improvements:

  • PRES-3002: Qubole has added a configuration property, hive.max-execution-partitions-per-scan to limit the maximum number of partitions that a table scan is allowed to read during query execution. Disabled | Cluster Restart Required
  • PRES-3148: Qubole has extended the DF optimization to semi joins to take advantage of a selective build side in queries with the IN clause.
  • PRES-3149: Dynamic filters are now pushed down to ORC and Parquet readers to reduce data scanned on the probe side for partitioned as well as non-partitioned tables. Cluster Restart Required
  • PRES-3404: Qubole has improved the utilization of dynamic filters on worker nodes and reduced load on coordinator when dynamic filtering is enabled.

Improvements in Reading Hive ACID Tables

PRES-3147: Presto has done these improvements to support reading HIVE ACID tables:

  • Qubole has done performance improvements in reading transactional table’s original files.
  • Qubole supports ACID tables written through UNION ALL queries in Hive.
  • Qubole supports full ACID bucketed Hive Tables.
  • Qubole only supports using the Hive Metastore Service 3.0 or later versions for reading Hive transactional tables in Presto.
  • PRES-2840: Hive 2.0-versioned ACID transactional tables are not supported in Presto version 317. Qubole has added checks to fail queries using such tables.
  • PRES-3320: Qubole has added checks to fail Presto queries on Hive ACID tables when the Hive metastore server’s version is older than 3.0.

Changes in Datadog Alerts

Qubole has added these Datadog alerts:

  • PRES-3360: Qubole has added a Datadog alert to detect runaway splits occupying execution slots for more than 10 minutes.

    Qubole has removed the presto.jmx.qubole.request_failures metric from the default Datadog dashboard metrics. It has removed the Datadog alert on the CPU utilization over 80%.

  • PRES-3468: Qubole has added a Datadog alert to detect if the Coordinator Average Heap Memory Usage is more than 90%.

  • PRES-3503: Qubole has added a Datadog alert to detect if the current spot nodes percentage by the desired spot percentage ratio is lower than 80% on an average for 4 hours.

  • PRES-3508: Qubole has added a Datadog alert to detect if the coordinator’s Presto server open file descriptor has exceeded its limit.

Weighted Distribution of CPU in Worker Nodes-based on Resource Groups

PRES-3194: Presto on Qubole now supports a weighted distribution of CPU on worker nodes among the active resource groups. CPU is distributed in the same proportion as the scheduling weight of active resource groups. Beta, Disabled | Cluster Restart Required

Enhancements

  • PRES-2958: Qubole has added a procedure for the Hive connector to clear table cache. Users can call the following procedure to clear cache corresponding to that table:

    • Presto version 0.208: catalogName.default.clear_table_cache('schema_name','table_name')
    • Presto version 317: catalogName.system.clear_table_cache('schema_name','table_name')
  • PRES-3031: Qubole supports pushing the spot percentage configuration on a running Presto cluster by default. The node rebalancer automatically rebalances nodes depending on the refreshed spot percentage.

  • PRES-3108: Qubole has added impersonation support for calls to the Hive metastore. You can enable it using the hive.metastore.thrift.impersonation.enabled configuration property. Disabled

  • PRES-3257: Presto now supports removing unhealthy nodes based on the disk-usage also. The coordinator node periodically fetches the disk usage from worker nodes and gracefully shuts down worker nodes that have used disk space exceeding a threshold value. The threshold value defaults to 0.9. You can set the threshold using the ascm.bad-node-removal.disk-usage-max-threshold parameter and its supported value range is 0.0 - 1.0. Beta | Cluster Restart Required

  • PRES-3273: Qubole has done these enhancements in the Presto Ranger integration:

    • Qubole has added support for column masking HASH for Ranger.
    • You can enable the Solr audit store in the ranger.<catalog>.audit-config-xml to use the auditing feature as described in Ranger Plugin Configuration Files. Disabled
  • PRES-3307: Presto on Qubole authenticates Presto REST API endpoints when SSL is enabled. The inter-node communication between the coordinator and worker nodes is authenticated only when SSL is enabled in Presto version 0.208. But in Presto version 317, the communication between the coordinator and worker nodes is authenticated even when SSL is disabled. These changes are backported into Presto versions 0.208 and 317 from the latest open-source Presto version.

    For more information, see the documentation.

  • PRES-3353: QueryHistID is returned as part of the error message for queries executed through cloud-agnostic drivers if show_on_ui is set to true on the cloud-agnostic drivers. QueryHistID is useful in debugging. Qubole plans to launch cloud-agnostic drivers shortly.

  • PRES-3426: The issue where jobs got scheduled on the coordinator node instead of getting scheduled on worker nodes when RubiX is enabled is fixed.

  • PRES-3469: Qubole has backported OS fixes to improve the performance of inequality JOINs that involve BETWEEN and GROUP BY queries.

  • PRES-3542: Qubole has removed Presto 0.180 from the cluster AMI. Any existing 0.180 version cluster must be upgraded to 0.193 or later versions. Cluster Restart Required

Bug Fixes

  • PRES-1799: Presto now returns the number of files written during a INSERT OVERWRITE DIRECTORY (IOD) query execution in QueryInfo. The Presto client in Qubole Control Plane later uses this information to wait for the returned number of files at the IOD location to be displayed. It fixes the eventual consistency issues while reading query results through the QDS UI.
  • PRES-2555: The issue where a Presto query UI showed a very large cumulative memory for JMX queries is resolved. The Presto Query UI now shows the correct value for cumulative user memory.
  • PRES-3411: Qubole has fixed UnsupportedOperationException encountered with certain multi-join queries with dynamic filtering enabled.
  • PRES-3426: The issue where the work required for a Presto query got scheduled on the coordinator node along with worker nodes is fixed.
  • PRES-3480: Qubole now supports SSL in Presto notebooks. You can now attach Presto notebooks to an SSL-enabled Presto cluster. Earlier, Presto notebooks attached to an SSL-enabled cluster failed.
  • PRES-3544: The issue where dynamic filtering did not work on SSL-enabled clusters is resolved.

For a list of bug fixes between versions R58 and R59, see Changelog for api.qubole.com.