Understanding the Presto Engine Configuration

The cluster settings page has a text box labelled Override Presto Configuration which you can use to customize a Presto cluster. An entry in this box can have multiple sections; each section should have a section title, which serves as the relative pathname of the configuration file in the etc directory of Presto, followed by the configuration. You can configure JVM settings, common Presto settings and connector settings from here. You can learn more about these sections in Presto’s official documentation. Here is an example custom configuration:

jvm.config:
-Xmx10g

config.properties:
ascm.enabled=false

catalog/hive.properties:
hadoop.cache.data.enabled=false

Some important parameters for each configuration are covered in the following sections.

jvm.config

These are populated automatically and generally do not require custom values. These are used while launching Presto server JVM.

Parameter	Example	Default	Description
-Xmx	-Xmx10g	70% of Instance Memory	10 GB for JVM heap
-XX:+ExitOnOutOfMemoryError	false	true	Because an `OutOfMemoryError` typically leaves the JVM in an inconsistent state, Qubole forcibly terminates the process when it occurs by setting this JVM property.
-Djdk.nio.maxCachedBufferSize	2097900	2097152	This property limits the amount of native memory used for NIO buffers by setting its default value. This prevents increase in the non-heap memory usage for the JVM process. Its value is set in bytes.

Presto Configuration Properties

The config.properties are described in the following section.

Understanding the Autoscaling Properties

Note

In case of a Presto cluster, the P icon is marked for the Presto Overrides but the push operation is not applicable to all properties except a few autoscaling properties listed in the following table. If you try to push configuration properties (that you had removed), the value of such configuration properties do not get refreshed in the running cluster as it continues to be the same value used before.

Parameter	Examples	Default	Pushable into a running cluster	Description
ascm.enabled	true, false	true	No	Use this parameter to enable autoscaling.
ascm.upscaling.enabled	true, false	true	Yes	Use this parameter to enable upscaling.
ascm.downscaling.enabled	true, false	true	Yes	Use this parameter to enable downscaling.
ascm.bds.target-latency	1m, 50s	1m	Yes	You can set time interval to change the target latency for the jobs. Increasing it makes autoscaling less aggressive.
ascm.bds.interval	10s, 1m	10s	No	The periodic interval set after which reports are gathered and processed to find out the cluster’s optimal size.
ascm.completition.base.percentage	1, 3	2	Yes	The percentage is set for the two phases in the query execution during which Qubole does not consider query metrics in an autoscaling decision. The starting and ending phases of the query execution time are the two phases. The default value is 2. It implies that before 2% and after 98% of of the query completion, Qubole does not consider it in autoscaling decisions.
ascm.downscaling.trigger.under-utilization-interval	5m, 45s	5m	No	The time interval during which all cycles of reports’ processing must suggest the cluster to scale down to actually scale down the cluster. For example, when this interval is set to `5m`, it means that only during an interval of 5 minutes, when all reports suggest that cluster is being under-utilized, would the scaling logic decide to initiate down-scaling. This safeguards against temporary blips which would cause downscaling.
ascm.downscaling.group-size	5, 8	5	Yes	Down-scaling in steps and the value indicates the number of nodes that are removed per cycle of down-scaling.
ascm.sizer.min-cluster-size	2, 3	1	Yes	It denotes the minimum cluster size or the minimum number of cluster nodes. It is also available as a UI option on the Presto cluster UI.
ascm.sizer.max-cluster-size	3, 6	2	Yes	It denotes the maximum cluster size or the maximum number of cluster nodes. It is also available as a UI option on the Presto cluster UI.
ascm.upscaling.trigger.over-utilization-interval	4m, 50s	value of ascm.bds.interval	No	The time interval during which all cycles of reports’ processing must suggest the cluster to scale up to actually scale up the cluster.
ascm.upscaling.group-size	9, 10	Infinite	Yes	Upscaling in steps and the value indicates the number of nodes that are added per cycle of up-scaling (capped by the maximum size set for the cluster).
query-manager.required-workers	4, 6	NA	No	It is to set the number of worker nodes that must be present in the cluster before a query is scheduled to be run on the cluster. A query is scheduled only after the configured `query-manager.required-workers-max-wait` timeout. This is only supported with Presto 0.193 and later versions. For more information, see Configuring the Required Number of Worker Nodes.
query-manager.required-workers-max-wait	7m, 9m	5m	No	It is the maximum time a query can wait before getting scheduled on the cluster if the required number of worker nodes set for `query-manager.required-workers` could not be provisioned. For more information, see Configuring the Required Number of Worker Nodes.

Understanding the Autoscaling Properties associated with Spot Blocks

Parameter	Examples	Default	Description
ascm.node-expiry-period	30m	15m	It is related to Spot block nodes rotation. For more information, see Configuring Spot Block Nodes as Autoscaling Nodes. This defines the time before the Spot block duration end of a node when Qubole starts the graceful shutdown of a node. You must configure this value based on the maximum query execution time that queries must be allowed to run without encountering a node loss. The default value is 15 minutes.
ascm.node-recycle-period	25m	15m	It is related to Spot block nodes rotation. For more information, see Configuring Spot Block Nodes as Autoscaling Nodes. This defines the time before `ascm.node-expiry-period` when Qubole starts proactively adding spot block nodes to replace the ones that are about to expire. A proactive rotation is required to maintain cluster at its optimal size without being affected by the expiry of spot block nodes. Qubole spreads out replacement of nodes over the `ascm.node-recycle-period` to avoid unnecessary upscaling of the cluster by a large number of nodes. The default value is 15 minutes.

Understanding the Query Execution Properties

Note

For information on disabling the reserved pool, see Disabling Reserved Pool.

These query execution properties are applicable to Presto 0.193 and earlier versions.

Parameter	Examples	Default	Description
query.max-concurrent-queries	2000	1000	It denotes the number of queries that can run in parallel.
query.max-execution-time	20d, 45h	100d	It denotes the time limit on the query execution time. It considers the time only spent in the query execution phase. The default value is 100 days. This parameter can be set in any of these time units: Nano seconds denoted by `ns` Microseconds denoted by `us` Milliseconds denoted by `ms` Seconds denoted by `s` Minutes denoted by `m` Hours denoted by `h` Days denoted by `d` Its equivalent session property is `query_max_execution_time`, which can also be specified in any of the time units given above.
query.max-memory-per-node	10GB, 20GB	28% of Physical Memory	Maximum memory that a query can take up on a node. If the value is set more than 42% of Physical Memory, cluster failures occur. 40% of Heap is reserved for the system memory pool. You can allocate the remaining 60% of Heap for this configuration. In Qubole, since Heap is 70% of Physical Memory, you can set a maximum of 42% of Physical Memory for this configuration.
query.max-memory	80GB, 20TB	100TB	Maximum memory that a query can take aggregated across all nodes. To decrease or modify the default value, add it as a Presto override or set the `query_max_memory` session property.
query.schedule-split-batch-size	1000, 10000	1000	Number of schedule splits at once
query.max-queued-queries	6000	5000	Denotes the number of queries that can be queued. See Queue Configuration for more information on advanced queuing configuration options.
optimizer.optimize-single-distinct	`false`	`true`	This implies that the single distinct optimization tries to replace multiple `DISTINCT` clauses with a single `GROUP BY` clause, which can substantially speed up the query execution. It is only supported in Presto 0.193 and earlier versions.
resources.reserved-system-memory	40/41% of Physical Memory	40% of Physical Memory	This is the resources reserved system memory and if you set its value more 42% of Physical Memory, cluster failures occur.

These query execution properties are applicable to Presto 0.208 and later versions.

Parameter	Examples	Default	Description
query.max-concurrent-queries	2000	1000	It denotes the number of queries that can run in parallel.
query.max-execution-time	20d, 45h	100d	It denotes the time limit on the query execution time. It considers the time only spent in the query execution phase. The default value is 100 days. This parameter can be set in any of these time units: Nano seconds denoted by `ns` Microseconds denoted by `us` Milliseconds denoted by `ms` Seconds denoted by `s` Minutes denoted by `m` Hours denoted by `h` Days denoted by `d` Its equivalent session property is `query_max_execution_time`, which can also be specified in any of the time units given above.
query.max-memory-per-node	10GB, 20GB	30% of Heap memory	It denotes the maximum amount of user memory that a query may use on a machine. The user memory is memory controllable by a user based on the query. For example, memory for performing aggregations, JOINs, Sorting and so on, is allocated from the user memory as the amount of memory required is based on the number of groups, JOIN keys or values to be sorted.
query.max-memory	80GB, 20TB	100TB	Maximum memory that a query can take aggregated across all nodes. To decrease or modify the default value, add it as a Presto override or set the `query_max_memory` session property.
query.schedule-split-batch-size	1000, 10000	1000	Number of schedule splits at once
query.max-queued-queries	6000	5000	Denotes the number of queries that can be queued. See Queue Configuration for more information on advanced queuing configuration options.
optimizer.optimize-single-distinct	`false`	`true`	This implies that the single distinct optimization tries to replace multiple `DISTINCT` clauses with a single `GROUP BY` clause, which can substantially speed up the query execution. It is only supported in Presto 0.193 and earlier versions.
qubole-max-raw-input-data-size	1TB, 5GB	NA	You can set this property to limit the total bytes scanned for queries that get executed on a given cluster. Queries that exceed this limit fail with the `RAW_INPUT_DATASIZE_READ_LIMIT_EXCEEDED` exception. This ensures rogue queries do not run for a very long time. `qubole_max_raw_input_datasize` is the equivalent session property.
query.max-total-memory-per-node	10GB, 21GB	30% of Heap memory	It denotes the maximum amount of user and system memory that a query may use on a machine. A user cannot control the system memory. System memory is used during query execution by readers, writers, buffers for exchanging data between nodes and so on. The default value for `query.max-total-memory-per-node` must be greater than or equal to `query.max-memory-per-node`.
memory.heap-headroom-per-node	10GB	20% of Heap memory	It denotes the amount of memory on JVM heap set aside as headroom/buffer for allocations that are not tracked by Presto in user or system memory pools. The above default memory pool configuration for Presto 0.208 results in 30% of the heap for the reserved pool, 20% heap headroom for untracked memory allocations, and the remaining 50% of the heap for the general pool.

Understanding the Task Management Properties

Parameter	Examples	Default	Description
task.max-worker-threads	10, 20	4 * cores	Maximum worker threads per JVM
task.writer-count	The value must be a power of 2.	1	It is the number of concurrent Writer tasks per worker per query when inserting data through `INSERT` OR `CREATE TABLE AS SELECT` operations. You can set this property to make the data writes faster. The equivalent session property is `task_writer_count` and its value must also be a power of 2. For more information, see Configuring the Concurrent Writer Tasks Per Query. Caution Use this configuration judiciously to prevent overloading the cluster due to excessive resource utilization. So it is recommended to use higher value through session properties for queries which generate bigger outputs. For example, ETL jobs.

Understanding the Timestamp Conversion Properties

Parameter	Examples	Default	Description
client-session-time-zone	Asia/Kolkata	NA	The timestamp fields in output are automatically converted into the timezone specified by this property. It is helpful when you are in a different timezone than the Presto Server in which case the timestamp fields in the output would be displayed in the server timezone if this configuration is not set.

Understanding the Query Retry Mechanism Properties

Parameter	Examples	Default	Description
retry.autoRetry	true, false	true	It enables the Presto query retry mechanism feature at the cluster level.
retrier.max-wait-time-local-memory-exceeded	2m, 2s	5m	It is the maximum time to wait for Presto to give up on retrying while waiting for new nodes to join the cluster, if the query has failed with the `LocalMemoryExceeded` error. Its value is configured in seconds or minutes. For example, its value can be `2s`, or `2m`, and so on. Its default value is `5m`. If a new node does not join the cluster within this time period, Qubole returns the original query failure response.
retrier.max-wait-time-node-loss	2m, 2s	3m	It is the maximum time to wait for Presto to give up on retrying while waiting for new nodes to join the cluster if the query has failed due to the Spot node loss. Its value is configured in seconds or minutes. For example, its value can be `2s`, or `2m`, and so on. Its default value is `3m`. If a new node does not join the cluster within this configured time period, the failed query is retried on the smaller-sized cluster.
retry.nodeLostErrors		(written in the next column)	It is a comma-separated list of Presto errors (in a string form) that signify the node loss. The default value of this property is `"REMOTE_HOST_GONE","TOO_MANY_REQUESTS_FAILED","PAGE_TRANSPORT_TIMEOUT"`.