Using the Presto Query Retrying Mechanism

Qubole has added a query retry mechanism to handle query failures (if possible). It is useful in cases when Qubole adds nodes to the cluster during autoscaling or after a spot node loss (that is when the cluster composition contains Spot nodes). The new query retry mechanism:

  • Retries a query which is failed with the LocalMemoryExceeded error when the new nodes are added to the cluster or in the process of being added to the cluster.

  • Retries a query which failed with an error due to the worker node loss.

In the above two scenarios, there is a waiting time period to let new nodes join the cluster before Presto retries the failed query. To avoid an endless waiting time period, Qubole has added appropriate timeouts. Qubole has also ensured that any actions performed by the failed query’s partial execution are rolled back before retrying the failed query.

Uses of the Query Retry Mechanism

The query retry mechanism is useful in these two cases:

  • When a query triggers upscaling but fails with the LocalMemoryExceeded error as it is run on a smaller-size cluster. The retry mechanism ensures that the failed query is automatically retried on that upscaled cluster.

  • When a Spot node loss happens during the query execution. The retry mechanism ensures that the failed query is automatically retried when new nodes join the cluster (when there is a Spot node loss, Qubole automatically adds new nodes to stabilize the cluster after it receives a Spot termination notification. Hence, immediately after the Spot node loss, a new node joins the cluster).

Disabling the Query Retry Mechanism

You can enable this feature at cluster and session levels by using the corresponding properties:

  • At the cluster level: Override retry.autoRetry=false in the Presto cluster overrides. On the Presto Cluster UI, you can override a cluster property under Advanced Configuration > PRESTO SETTINGS > Override Presto Configuration. This property is enabled by default.

  • At the session level: Set auto_retry=false in the specific query’s session. This property is enabled by default.

    Note

    The session property is more useful as an option to disable the retry feature at query level when autoRetry is enabled at the cluster level.

Configuring the Query Retry Mechanism

You can configure these parameters:

  • retrier.max-wait-time-local-memory-exceeded: It is the maximum time to wait for Presto to give up on retrying while waiting for new nodes to join the cluster, if the query has failed with the LocalMemoryExceeded error. Its value is configured in seconds or minutes. For example, its value can be 2s, or 2m, and so on. Its default value is 5m. If a new node does not join the cluster within this time period, Qubole returns the original query failure response.

  • retrier.max-wait-time-node-loss: It is the maximum time to wait for Presto to give up on retrying while waiting for new nodes to join the cluster if the query has failed due to the Spot node loss. Its value is configured in seconds or minutes. For example, its value can be 2s, or 2m, and so on. Its default value is 3m. If a new node does not join the cluster within this configured time period, the failed query is retried on the smaller-sized cluster.

  • retry.nodeLostErrors: It is a comma-separated list of Presto errors (in a string form) that signify the node loss. The default value of this property is "REMOTE_HOST_GONE","TOO_MANY_REQUESTS_FAILED","PAGE_TRANSPORT_TIMEOUT".

Graded Presto Query Retries

Presto query retries for memory exceeded exceptions are triggered in a graded manner. Qubole retries the failed query in three steps that occur as below:

  1. The first query retry step occurs at the minimum cluster size + 1/4 of (difference of minimum and maximum cluster size).

  2. The second query retry step occurs at the minimum cluster size + 1/2 of (difference of minimum and maximum cluster size).

  3. The third query retry step occurs at the cluster's maximum size.

Note

This enhancement is part of Gradual Rollout.

For example, let us consider a Presto cluster with a minimum size of 2 and a maximum size of 50. The graded manner of query retry is as described below:

  1. Step 1 occurs at 2 + (26-2)/2 = 14th node.

  2. Step 2 occurs at 2 + (50-2)/2 = 26th node.

  3. Step 3 occurs at 50th node.

When a query fails at nth node, Qubole skips the steps below n assuming that they are completed and proceeds with the next steps of query retry until the maximum cluster size. So, in the above example:

  • If the query fails before the 14th node, then the query retry goes through all three steps.

  • If the query fails between 14th and 26th nodes, then the query retry goes through only last two steps of 26 and 50.

  • If the query fails after 26th node, then the query needs only one retry at 50th node.

Understanding the Query Retry Mechanism

The query retries can occur multiple times. By default, three retries can occur if all conditions are met. The conditions on which the retries happen are:

  • The error is retryable. Currently, LocalMemoryExceeded and node loss errors: REMOTE_HOST_GONE, TOO_MANY_REQUESTS_FAILED, and PAGE_TRANSPORT_TIMEOUT are considered retryable. This list of node loss errors is configurable using the retry.nodeLostErrors property.

  • INSERT OVERWRITE DIRECTORY, INSERT OVERWRITE TABLE, and CREATE TABLE AS SELECT (CTAS) queries are considered retryable. SELECT queries that do not return data before they fail are also retryable.

  • The actions of a failed query are rolled back successfully. If the rollback fails or if Qubole times out waiting, then it does not retry it.

  • A failed query has a chance to succeed if retried:

    • For the LocalMemoryExceeded error: The query has a chance to succeed if the current number of workers is greater than the number of workers handling the Aggregation stage. If Qubole times out waiting to get to this state, it does not retry.

    • For the node loss errors: The query has a chance to succeed if the current number of workers is greater than or equal to the number of workers that the query ran on earlier. If Qubole times out waiting to get to this state, it goes ahead with the retry as the query may still pass in the smaller-sized cluster.