Presto Best Practices¶
This section describes some best practices for Presto queries and it covers:
- ORC Format
- Specify JOIN Ordering
- Specifying JOIN Reordering
- Enabling Dynamic Filter
- Avoiding Stale Caches
- Compressing Data Writes Through CTAS and INSERT Queries
- Configuring Data Writes Compression in Presto
- Ignoring Corrupt Records in a Presto Query
- Using the Custom Event Listener
- Proactively Removing Unhealthy Cluster Nodes
Qubole recommends that you use ORC file format; ORC
outperforms text format considerably. For example, suppose you have you have a table
nation in delimited form
partitioned on column
p. You can create the ORC version using this DDL as a Hive query.
DROP table if exists nation_orc; CREATE table nation_orc like nation; ALTER table nation_orc set fileformat orc; ALTER table nation_orc set tblproperties ("orc.compress"="SNAPPY"); SET hive.exec.dynamic.partition = true; SET hive.exec.dynamic.partition.mode = nonstrict; INSERT INTO table nation_orc partition(p) SELECT * FROM nation;
nation table is not partitioned, replace the last 3 lines with the following:
INSERT INTO table nation_orc SELECT * FROM nation;
You can run queries against the newly generated table in Presto, and you should see a big difference in performance.
ORC format supports skipping reading portions of files if the data is sorted (or mostly sorted) on the filtering columns.
For example, if you expect that queries are often going to filter data on column
n_name, you can include a
SORT BY clause when using Hive to insert data into the ORC table; for example:
INSERT INTO table nation_orc partition(p) SELECT * FROM nation SORT BY n_name;
This helps with queries such as the following:
SELECT count(*) FROM nation_orc WHERE n_name=’AUSTRALIA’;
Specify JOIN Ordering¶
Presto does automatic JOIN re-ordering only when the feature is enabled. For more information, see Specifying JOIN Reordering. Otherwise, you need to make sure that smaller tables appear on the right side of the JOIN keyword. For example, if table A is larger than table B, write a JOIN query as follows:
SELECT count(*) FROM A JOIN B on (A.a=B.b)
A bad JOIN command can slow down a query as the hash table is created on the bigger table, and if that table does not fit
into memory, it can cause out-of-memory (
Specifying JOIN Reordering¶
Presto supports JOIN Reordering based on table statistics. It enables ability to pick optimal order for joining tables and it only works with INNER JOINS. This configuration is supported only in Presto 0.180 and later versions. Presto 0.208 has the open-source version of JOIN Reordering.
As a prerequisite before using JOIN Reordering, ensure that the table statistics must be collected for all tables that are in the query.
Enable the JOIN Reordering feature in Presto 0.180 and 0.193 versions (these properties do not hold good to Presto 0.208):
- As a session-level property by setting
qubole_reorder_joins = true.
- As a Presto override in the Presto cluster by setting
qubole-reorder-joins=true. It requires a cluster restart for the configuration to be effective.
Enable the JOIN Reordering feature in Presto 0.208 version by setting the reordering strategy and the number of reordered joins, which are described here:
optimizer.join-reordering-strategy: It accepts a string value and the accepted values are:
AUTOMATIC: It enumerates possible orders and uses statistics-based cost estimation for finding out the cost order which is the lesser compared to others. If the statistics are unavailable or if the computing cost fails, then
ELIMINATE_CROSS_JOINS: It is the default strategy and it reorders JOINs to remove cross JOINS where otherwise this strategy maintains the original query order. It also tries maintaining the original table order.
NONE: It indicates that the order the tables listed in the query is maintained.
The equivalent session property is
optimizer.max-reordered-joins: It is the maximum number of joins that can be reordered at a time when
optimizer.join-reordering-strategyis set to a cost-based value. Its default value is 9.
You should be cautious while increasing this property’s value as it can result in performance issues. The number of possible JOIN orders increases with the number of relations.
Cost-based optimization (CBO) for JOIN reordering and JOIN distribution type selection using statistics present in the Hive metastore is enabled by default for Presto version 0.208.
The following values are added to default cluster configuration for Presto version 0.208.
optimizer.join-reordering-strategy=AUTOMATIC join-distribution-type=AUTOMATIC join-max-broadcast-table-size=100MB
Enabling Dynamic Filter¶
Qubole supports the Dynamic Filter feature. It is a join optimization to improve performance of JOIN queries. It has been introduced to optimize Hash JOINs in Presto which can lead to significant speedup in relevant cases. It is not enabled by default. This configuration is supported only in Presto 0.180 and later versions.
Enable the Dynamic Filter feature as a session-level property using one of these commands based on the Presto version:
session dynamic_filtering = truein Presto 0.208 and earlier versions (earliest supported version is 0.180).
session enable_dynamic_filtering = truein Presto 317 (beta).
Enable the Dynamic Filter feature as a Presto override in the Presto cluster using one of these commands based on the Presto version:
experimental.dynamic-filtering-enabled=truein Presto 0.208 and earlier versions (earliest supported version is 0.180). It requires a cluster restart for the configuration to be effective.
experimental.dynamic-filtering-enabled`=truein Presto 317 (beta). It requires a cluster restart for the configuration to be effective.
Qubole has introduced a feature to enable dynamic partition pruning for join queries on partitioned columns in Hive tables at account level. It is part of Gradual Rollout.
Avoiding Stale Caches¶
It’s useful to tweak the cache parameters if you expect data to change rapidly. See catalog/hive.properties for more information.
For example, if a Hive table adds a new partition, it takes Presto 20 minutes to discover it. If you plan on changing existing files in the Cloud, you may want to make fileinfo expiration more aggressive. If you expect new files to land in a partition rapidly, you may want to reduce or disable the dirinfo cache.
Compressing Data Writes Through CTAS and INSERT Queries¶
Data writes can be compressed only when the target format is HiveIgnoreKeyTextOutputFormat. As
INSERT OVERWRITE/INTO DIRECTORY
uses HiveIgnoreKeyTextOutputFormat, the data written through it can also be compressed by setting the session-level
property and codec. All SELECT queries with LIMIT > 1000 are converted into
INSERT OVERWRITE/INTO DIRECTORY.
The INSERT OVERWRITE DIRECTORY command accepts a custom delimiter, which must be an ASCII value. You can specify the
ASCII values using double quotes, for example,
"," or as a binary literal such as
Here is the syntax to specify a custom delimiter.
insert overwrite directory 's3://sample/defloc/presto_query_result/1/' DELIMITED BY <Custom delimiter> SELECT * FROM default.sample_table;
The <Custom delimiter> parameter does not accept multiple characters and non-ASCII characters as the parameter value.
Configuring Data Writes Compression in Presto¶
To compress data written from CTAS and INSERT queries to cloud directories, set
hive.compression-codec in the
Override Presto Configuration field under the Clusters > Advanced Configuration UI page. Set the compression codec
catalog/hive.properties as illustrated below.
QDS supports the following compression codecs:
- GZIP (default codec)
- NONE (used when no compression is required)
When the codec is set, data writes from a successful execution of a CTAS/INSERT Presto query are compressed as per the compression-codec set and stored in the cloud.
To see the file content, navigate to Explore in the QDS UI and select the file under the My Amazon S3 or My Blob tab. QDS translates the file into a readable format. See Exploring Data in the Cloud for more information.
Ignoring Corrupt Records in a Presto Query¶
Presto has added a new Hive connector configuration,
hive.skip-corrupt-records to skip corrupt records in input formats other than
rcfile. It is set to
false by default on a Presto cluster. Set
for all queries on a Presto cluster to ignore corrupt records. This configuration is supported only in Presto 0.180 and
You can also set it as a session property as
hive.skip_corrupt_records=true in a session when the active
cluster does not have this configuration globally enabled.
The behavior for the corrupted file is non-deterministic, that is Presto might read some part of the file before hitting corrupt data and in such a case, the QDS record-reader returns whatever it read until this point and skips the rest of the file.
Using the Custom Event Listener¶
Event listeners are invoked on each query creation, completion, and split completion. An event listener enables the development of the query performance and analysis plugins. Qubole Presto uses its own event listener but it supports integration of an external Presto Event Listener. This integration is not supported by default. Create a ticket with Qubole Support to enable this feature of integrating a custom event listener.
At a given point of time, only a single event listener can be active in a Presto cluster.
After the feature is enabled, perform these steps:
- Modify a Custom Event Listener to add logs in the JSON format to the Cloud Object Storage location.
- Build a JAR file. An example is available in QueryEventListener Jar.
- Place the JAR file in the Cloud Object Storage location.
- Add this code that is available in the bootstrap template in the Presto cluster’s node bootstrap.
- Restart the cluster to make the custom event listener active.
Proactively Removing Unhealthy Cluster Nodes¶
To maintain better cluster health, you can enable a service to proactively remove unhealthy nodes from a Presto cluster.
true to enable the service by passing it as a cluster override on a Presto cluster.
By default, this service runs periodically every minute. You can set a different time by changing its value using
ascm.bad-node-removal.interval configuration property.
The master node periodically fetches open file descriptor counts from worker nodes and gracefully shuts down nodes
whose open file descriptor count exceeds a threshold value that defaults to
0.9. You can set the threshold using the
ascm.bad-node-removal.file-descriptor-max-threshold parameter and its supported value range is
0.0 - 1.0.