Presto Best Practices¶
This section describes some best practices for Presto queries and it covers:
- ORC Format
- Specify JOIN Ordering
- Specifying JOIN Reordering
- Enabling Dynamic Filter
- Avoiding Stale Caches
- Compressing Data Writes Through CTAS and INSERT Queries
- Configuring Data Writes Compression in Presto
- Ignoring Corrupt Records in a Presto Query
- Using the Custom Event Listener
- Proactively Removing Unhealthy Cluster Nodes
Qubole recommends that you use ORC file format; ORC
outperforms text format considerably. For example, suppose you have you have a table
nation in delimited form
partitioned on column
p. You can create the ORC version using this DDL as a Hive query.
DROP table if exists nation_orc; CREATE table nation_orc like nation; ALTER table nation_orc set fileformat orc; ALTER table nation_orc set tblproperties ("orc.compress"="SNAPPY"); SET hive.exec.dynamic.partition = true; SET hive.exec.dynamic.partition.mode = nonstrict; INSERT INTO table nation_orc partition(p) SELECT * FROM nation;
nation table is not partitioned, replace the last 3 lines with the following:
INSERT INTO table nation_orc SELECT * FROM nation;
You can run queries against the newly generated table in Presto, and you should see a big difference in performance.
ORC format supports skipping reading portions of files if the data is sorted (or mostly sorted) on the filtering columns.
For example, if you expect that queries are often going to filter data on column
n_name, you can include a
SORT BY clause when using Hive to insert data into the ORC table; for example:
INSERT INTO table nation_orc partition(p) SELECT * FROM nation SORT BY n_name;
This helps with queries such as the following:
SELECT count(*) FROM nation_orc WHERE n_name=’AUSTRALIA’;
Specify JOIN Ordering¶
Presto does automatic JOIN re-ordering only when the feature is enabled. For more information, see Specifying JOIN Reordering. Otherwise, you need to make sure that smaller tables appear on the right side of the JOIN keyword. For example, if table A is larger than table B, write a JOIN query as follows:
SELECT count(*) FROM A JOIN B on (A.a=B.b)
A bad JOIN command can slow down a query as the hash table is created on the bigger table, and if that table does not fit
into memory, it can cause out-of-memory (
Specifying JOIN Reordering¶
Presto supports JOIN Reordering based on table statistics. It enables ability to pick optimal order for joining tables and it only works with INNER JOINS. This configuration is supported only in Presto 0.180 and later versions. Presto 0.208 has the open-source version of JOIN Reordering.
As a prerequisite before using JOIN Reordering, ensure that the table statistics must be collected for all tables that are in the query.
Enable the JOIN Reordering feature in Presto 0.180 and 0.193 versions (these properties do not hold good to Presto 0.208):
- As a session-level property by setting
qubole_reorder_joins = true.
- As a Presto override in the Presto cluster by setting
qubole-reorder-joins=true. It requires a cluster restart for the configuration to be effective.
Enable the JOIN Reordering feature in Presto 0.208 version by setting the reordering strategy and the number of reordered joins, which are described here:
optimizer.join-reordering-strategy: It accepts a string value and the accepted values are:
AUTOMATIC: It enumerates possible orders and uses statistics-based cost estimation for finding out the cost order which is the lesser compared to others. If the statistics are unavailable or if the computing cost fails, then
ELIMINATE_CROSS_JOINS: It is the default strategy and it reorders JOINs to remove cross JOINS where otherwise this strategy maintains the original query order. It also tries maintaining the original table order.
NONE: It indicates that the order the tables listed in the query is maintained.
The equivalent session property is
optimizer.max-reordered-joins: It is the maximum number of joins that can be reordered at a time when
optimizer.join-reordering-strategyis set to a cost-based value. Its default value is 9.
You should be cautious while increasing this property’s value as it can result in performance issues. The number of possible JOIN orders increases with the number of relations.
Cost-based optimization (CBO) for JOIN reordering and JOIN distribution type selection using statistics present in the Hive metastore is enabled by default for Presto version 0.208.
The following values are added to default cluster configuration for Presto version 0.208.
optimizer.join-reordering-strategy=AUTOMATIC join-distribution-type=AUTOMATIC join-max-broadcast-table-size=100MB
Enabling Dynamic Filter¶
Qubole supports the Dynamic Filter feature. It is a join optimization to improve performance of JOIN queries. It has been introduced to optimize Hash JOINs in Presto which can lead to significant speedup in relevant cases.
It is not enabled by default. Enable the Dynamic Filter feature as a session-level property using one of these commands based on the Presto version:
session dynamic_filtering = truein Presto 0.208 and earlier versions (earliest supported version is 0.180).
session enable_dynamic_filtering = truein Presto 317.
Enable the Dynamic Filter feature as a Presto override in the Presto cluster using one of these commands based on the Presto version:
experimental.dynamic-filtering-enabled=truein Presto 0.208 and earlier versions (earliest supported version is 0.180). It requires a cluster restart for the configuration to be effective.
experimental.enable-dynamic-filtering=truein Presto 317. It requires a cluster restart for the configuration to be effective.
Qubole has introduced a feature to enable dynamic partition pruning for join queries on partitioned columns in Hive tables at account level. It is part of Gradual Rollout.
Qubole has added a configuration property,
hive.max-execution-partitions-per-scan to limit the maximum number of partitions
that a table scan is allowed to read during a query execution.
hive.max-partitions-per-scan limits the the number of
partitions per table scan during the planning stage before a query execution begins.
Qubole has extended the dynamic filter optimization to semi-join to take advantage of a selective build side in queries with the
SELECT COUNT(*) from store_sales where ss_sold_date_sk IN (SELECT s_closed_date_sk from store);
Reducing Data Scanned on the Probe Side¶
Dynamic filters are pushed down to ORC and Parquet readers to reduce data scanned on the probe side for partitioned
as well as non-partitioned tables. Dynamic filters pushed down to ORC and Parquet readers are more effective in filtering
data when it is ordered by
Example: In the following query, ordering
ss_sold_date_sk during the ingestion immensely
improves the effectiveness of dynamic filtering.
SELECT COUNT(*) from store_sales_sorted ss, store s where ss.ss_sold_date_sk = s.s_closed_date_sk;
Avoiding Stale Caches¶
It’s useful to tweak the cache parameters if you expect data to change rapidly. See catalog/hive.properties for more information.
For example, if a Hive table adds a new partition, it takes Presto 20 minutes to discover it. If you plan on changing existing files in the Cloud, you may want to make fileinfo expiration more aggressive. If you expect new files to land in a partition rapidly, you may want to reduce or disable the dirinfo cache.
Compressing Data Writes Through CTAS and INSERT Queries¶
Data writes can be compressed only when the target format is HiveIgnoreKeyTextOutputFormat. As
INSERT OVERWRITE/INTO DIRECTORY
uses HiveIgnoreKeyTextOutputFormat, the data written through it can also be compressed by setting the session-level
property and codec. All SELECT queries with LIMIT > 1000 are converted into
INSERT OVERWRITE/INTO DIRECTORY.
Presto returns the number of files written during a
INSERT OVERWRITE DIRECTORY (IOD) query execution in
QueryInfo. The Presto client in Qubole Control Plane later uses this information to wait for the returned number of
files at the IOD location to be displayed. It fixes the eventual consistency issues while reading query results through the QDS UI.
INSERT OVERWRITE DIRECTORY command accepts a custom delimiter, which must be an ASCII value. You can specify the
ASCII values using double quotes, for example,
"," or as a binary literal such as
Here is the syntax to specify a custom delimiter.
insert overwrite directory 's3://sample/defloc/presto_query_result/1/' DELIMITED BY <Custom delimiter> SELECT * FROM default.sample_table;
The <Custom delimiter> parameter does not accept multiple characters and non-ASCII characters as the parameter value.
Configuring Data Writes Compression in Presto¶
To compress data written from CTAS and INSERT queries to cloud directories, set
hive.compression-codec in the
Override Presto Configuration field under the Clusters > Advanced Configuration UI page. Set the compression codec
catalog/hive.properties as illustrated below.
QDS supports the following compression codecs:
- GZIP (default codec)
- NONE (used when no compression is required)
When the codec is set, data writes from a successful execution of a CTAS/INSERT Presto query are compressed as per the compression-codec set and stored in the cloud.
To see the file content, navigate to Explore in the QDS UI and select the file under the My Amazon S3 or My Blob tab. QDS translates the file into a readable format. See Exploring Data in the Cloud for more information.
Ignoring Corrupt Records in a Presto Query¶
Presto has added a new Hive connector configuration,
hive.skip-corrupt-records to skip corrupt records in input formats other than
rcfile. It is set to
false by default on a Presto cluster. Set
for all queries on a Presto cluster to ignore corrupt records. This configuration is supported only in Presto 0.180 and
You can also set it as a session property as
hive.skip_corrupt_records=true in a session when the active
cluster does not have this configuration globally enabled.
The behavior for the corrupted file is non-deterministic, that is Presto might read some part of the file before hitting corrupt data and in such a case, the QDS record-reader returns whatever it read until this point and skips the rest of the file.
Using the Custom Event Listener¶
Event listeners are invoked on each query creation, completion, and split completion. An event listener enables the development of the query performance and analysis plugins.
At a given point of time, only a single event listener can be active in a Presto cluster.
Perform these steps to install an event listener in the Presto cluster:
Create an event listener. You can use this Presto event listener as a template.
Build a JAR file and upload it to the cloud object store. For example, let us use
s3://presto/plugins/event-listener.jaras the cloud object storage location.
event-listener.jaron the Presto cluster using the Presto Server Bootstrap. You can add the Presto bootstrap properties as Presto overrides in the Presto cluster to download the JAR file. For downloading
event-listener.jar, pass the following bootstrap properties as Presto overrides through the Override Presto Configuration UI option in the cluster’s Advanced Configuration tab.
bootstrap.properties: mkdir /usr/lib/presto/plugin/event-listener cd /usr/lib/presto/plugin/event-listener hadoop fs -get s3://presto/plugins/event-listener.jar
Configure Presto to use the event-listener through the Override Presto Configuration UI option in the cluster’s Advanced Configuration tab as shown below.
Restart the Presto cluster.
Proactively Removing Unhealthy Cluster Nodes¶
The configuration property for removing unhealthy cluster nodes,
ascm.bad-node-removal, is set to
true by default
to maintain better cluster health. You can disable it by passing
ascm.bad-node-removal=false as a Presto cluster override.
By default, this service runs periodically every minute. You can set a different time by changing its value using
ascm.bad-node-removal.interval configuration property.
This feature identifies unhealthy worker nodes based on different triggers and gracefully shuts down such unhealthy nodes. The various triggers are:
- Worker nodes with a ratio of the open file descriptor count to the maximum file descriptor count higher than the
threshold value defaulting to 0.9. You can set the threshold value using the
ascm.bad-node-removal.file-descriptor-max-thresholdparameter, and its supported value range is
0.0 - 1.0.
- Worker nodes whose disk space usage ratio is greater than a threshold value defaulting to 0.95. You can set the
threshold value using the
ascm.bad-node-removal.disk-usage-max-thresholdparameter, and its supported value range is
0.0 - 1.0. Pass
ascm.bad-node-removal.disk-space-usage-policy=falseas a Presto cluster override if you want to disable the disk space usage trigger.