Presto Best Practices

This section describes some best practices for Presto queries and it covers:

ORC Format

Qubole recommends that you use ORC file format; ORC outperforms text format considerably. For example, suppose you have you have a table nation in delimited form partitioned on column p. You can create the ORC version using this DDL as a Hive query.

DROP table if exists nation_orc;
CREATE table nation_orc like nation;
ALTER table nation_orc set fileformat orc;
ALTER table nation_orc set tblproperties ("orc.compress"="SNAPPY");
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
INSERT INTO table nation_orc partition(p) SELECT * FROM nation;

If the nation table is not partitioned, replace the last 3 lines with the following:

INSERT INTO table nation_orc SELECT * FROM nation;

You can run queries against the newly generated table in Presto, and you should see a big difference in performance.


ORC format supports skipping reading portions of files if the data is sorted (or mostly sorted) on the filtering columns. For example, if you expect that queries are often going to filter data on column n_name, you can include a SORT BY clause when using Hive to insert data into the ORC table; for example:

INSERT INTO table nation_orc partition(p) SELECT * FROM nation SORT BY n_name;

This helps with queries such as the following:

SELECT count(*) FROM nation_orc WHERE n_name=’AUSTRALIA’;

Specify JOIN Ordering

Presto does automatic JOIN re-ordering only when the feature is enabled. For more information, see Specifying JOIN Reordering. Otherwise, you need to make sure that smaller tables appear on the right side of the JOIN keyword. For example, if table A is larger than table B, write a JOIN query as follows:

SELECT count(*) FROM A JOIN B on (A.a=B.b)

A bad JOIN command can slow down a query as the hash table is created on the bigger table, and if that table does not fit into memory, it can cause out-of-memory (OOM) exceptions.

Specifying JOIN Reordering

Presto supports JOIN Reordering based on table statistics. It enables ability to pick optimal order for joining tables and it only works with INNER JOINS. This configuration is supported only in Presto 0.180 and later versions. Presto 0.208 has the open-source version of JOIN Reordering.


As a prerequisite before using JOIN Reordering, ensure that the table statistics must be collected for all tables that are in the query.

Enable the JOIN Reordering feature in Presto 0.180 and 0.193 versions (these properties do not hold good to Presto 0.208):

  • As a session-level property by setting qubole_reorder_joins = true.
  • As a Presto override in the Presto cluster by setting qubole-reorder-joins=true. It requires a cluster restart for the configuration to be effective.

Enable the JOIN Reordering feature in Presto 0.208 version by setting the reordering strategy and the number of reordered joins, which are described here:

  • optimizer.join-reordering-strategy: It accepts a string value and the accepted values are:

    • AUTOMATIC: It enumerates possible orders and uses statistics-based cost estimation for finding out the cost order which is the lesser compared to others. If the statistics are unavailable or if the computing cost fails, then ELIMINATE_CROSS_JOINS is used.
    • ELIMINATE_CROSS_JOINS: It is the default strategy and it reorders JOINs to remove cross JOINS where otherwise this strategy maintains the original query order. It also tries maintaining the original table order.
    • NONE: It indicates that the order the tables listed in the query is maintained.

    The equivalent session property is join_reordering_strategy.

  • optimizer.max-reordered-joins: It is the maximum number of joins that can be reordered at a time when optimizer.join-reordering-strategy is set to a cost-based value. Its default value is 9.


    You should be cautious while increasing this property’s value as it can result in performance issues. The number of possible JOIN orders increases with the number of relations.

Cost-based optimization (CBO) for JOIN reordering and JOIN distribution type selection using statistics present in the Hive metastore is enabled by default for Presto version 0.208.

The following values are added to default cluster configuration for Presto version 0.208.


Enabling Dynamic Filter

Qubole supports the Dynamic Filter feature. It is a join optimization to improve performance of JOIN queries. It has been introduced to optimize Hash JOINs in Presto which can lead to significant speedup in relevant cases. It is not enabled by default. This configuration is supported only in Presto 0.180 and later versions.

Enable the Dynamic Filter feature as a session-level property using one of these commands based on the Presto version:

  • Set session dynamic_filtering = true in Presto 0.208 and earlier versions (earliest supported version is 0.180).
  • Set session enable_dynamic_filtering = true in Presto 317 (beta).

Enable the Dynamic Filter feature as a Presto override in the Presto cluster using one of these commands based on the Presto version:

  • Set experimental.dynamic-filtering-enabled=true in Presto 0.208 and earlier versions (earliest supported version is 0.180). It requires a cluster restart for the configuration to be effective.
  • Set experimental.dynamic-filtering-enabled`=true in Presto 317 (beta). It requires a cluster restart for the configuration to be effective.


Qubole has introduced a feature to enable dynamic partition pruning for join queries on partitioned columns in Hive tables at account level. It is part of Gradual Rollout.

Avoiding Stale Caches

It’s useful to tweak the cache parameters if you expect data to change rapidly. See catalog/ for more information.

For example, if a Hive table adds a new partition, it takes Presto 20 minutes to discover it. If you plan on changing existing files in the Cloud, you may want to make fileinfo expiration more aggressive. If you expect new files to land in a partition rapidly, you may want to reduce or disable the dirinfo cache.

Compressing Data Writes Through CTAS and INSERT Queries

Data writes can be compressed only when the target format is HiveIgnoreKeyTextOutputFormat. As INSERT OVERWRITE/INTO DIRECTORY uses HiveIgnoreKeyTextOutputFormat, the data written through it can also be compressed by setting the session-level property and codec. All SELECT queries with LIMIT > 1000 are converted into INSERT OVERWRITE/INTO DIRECTORY.

The INSERT OVERWRITE DIRECTORY command accepts a custom delimiter, which must be an ASCII value. You can specify the ASCII values using double quotes, for example, "," or as a binary literal such as X'AA'.

Here is the syntax to specify a custom delimiter.

insert overwrite directory 's3://sample/defloc/presto_query_result/1/' DELIMITED BY <Custom delimiter> SELECT * FROM default.sample_table;

The <Custom delimiter> parameter does not accept multiple characters and non-ASCII characters as the parameter value.

Configuring Data Writes Compression in Presto

To compress data written from CTAS and INSERT queries to cloud directories, set hive.compression-codec in the Override Presto Configuration field under the Clusters > Advanced Configuration UI page. Set the compression codec under catalog/ as illustrated below.


QDS supports the following compression codecs:

  • GZIP (default codec)
  • NONE (used when no compression is required)

See Understanding the Presto Engine Configuration for more information on how to override the Presto configuration. For more information on the Hive connector, see Hive Connector.

When the codec is set, data writes from a successful execution of a CTAS/INSERT Presto query are compressed as per the compression-codec set and stored in the cloud.

To see the file content, navigate to Explore in the QDS UI and select the file under the My Amazon S3 or My Blob tab. QDS translates the file into a readable format. See Exploring Data in the Cloud for more information.

Ignoring Corrupt Records in a Presto Query

Presto has added a new Hive connector configuration, hive.skip-corrupt-records to skip corrupt records in input formats other than orc, parquet and rcfile. It is set to false by default on a Presto cluster. Set hive.skip-corrupt-records=true for all queries on a Presto cluster to ignore corrupt records. This configuration is supported only in Presto 0.180 and later versions.

You can also set it as a session property as hive.skip_corrupt_records=true in a session when the active cluster does not have this configuration globally enabled.


The behavior for the corrupted file is non-deterministic, that is Presto might read some part of the file before hitting corrupt data and in such a case, the QDS record-reader returns whatever it read until this point and skips the rest of the file.

Using the Custom Event Listener

Event listeners are invoked on each query creation, completion, and split completion. An event listener enables the development of the query performance and analysis plugins. Qubole Presto uses its own event listener but it supports integration of an external Presto Event Listener. This integration is not supported by default. Create a ticket with Qubole Support to enable this feature of integrating a custom event listener.

At a given point of time, only a single event listener can be active in a Presto cluster.

After the feature is enabled, perform these steps:

  1. Modify a Custom Event Listener to add logs in the JSON format to the Cloud Object Storage location.
  2. Build a JAR file. An example is available in QueryEventListener Jar.
  3. Place the JAR file in the Cloud Object Storage location.
  4. Add this code that is available in the bootstrap template in the Presto cluster’s node bootstrap.
  5. Restart the cluster to make the custom event listener active.

Proactively Removing Unhealthy Cluster Nodes

To maintain better cluster health, you can enable a service to proactively remove unhealthy nodes from a Presto cluster. Set ascm.bad-node-removal to true to enable the service by passing it as a cluster override on a Presto cluster.

By default, this service runs periodically every minute. You can set a different time by changing its value using the ascm.bad-node-removal.interval configuration property.

The master node periodically fetches open file descriptor counts from worker nodes and gracefully shuts down nodes whose open file descriptor count exceeds a threshold value that defaults to 0.9. You can set the threshold using the ascm.bad-node-removal.file-descriptor-max-threshold parameter and its supported value range is 0.0 - 1.0.