Spot Rebalancing in Presto

Note

<short-lived compute instances> is referred to spot nodes in Qubole-on-AWS.

Spot Rebalancing is supported in Presto. This helps in scenarios when the <short-lived compute instances> ratio of a running cluster falls short of the configured spot ratio due to unavailability or frequent terminations of spot nodes. The Spot rebalancer ensures that the cluster proactively recovers from this shortfall and it brings the cluster to a state where its <short-lived compute instances> ratio is as close as possible to its configured value.

By default, after every 30 minutes, Qubole inspects the <short-lived compute instances> ratio of the cluster and attempts a rebalancing if the <short-lived compute instances> ratio falls short of the configured <short-lived compute instances> ratio. The time period for the <short-lived compute instances> ratio inspection is configurable using the ascm.node-rebalancer-cooldown-period parameter.

An example of using this configuration is setting ascm.node-rebalancer-cooldown-period=1h in the Presto cluster overrides. If this example setting is used, Qubole inspects for a skewed <short-lived compute instances> ratio every hour instead of 30 minutes.

Note

Using very small values for ascm.node-rebalancer-cooldown-period can lead to an instability in the cluster’s state. This feature is only applicable to the aggressive downscaling feature, which must be enabled in a Qubole account.

For more information, see Understanding Aggressive Downscaling in Clusters (AWS).

Spot Rebalancing Advanced Configuration Properties

These are the two advanced configuration properties:

  • ascm.sizer.max-cluster-size-buffer-percentage: While rebalancing a running cluster, Qubole tries to gracefully replace the additional running On-Demand nodes. In that process, the cluster may have to add some nodes beyond its maximum size. This configuration controls the maximum limit you can go beyond the cluster’s maximum size while rebalancing. The default value for this configuration property is 10.

    For example, consider ascm.sizer.max-cluster-size-buffer-percentage=20, which means that the cluster size does not exceed beyond 20% of the maximum cluster size while rebalancing.

  • ascm.node-rebalancer-max-extra-stable-nodes.percentage: This configuration property decides the amount of skew in the spot ratio of running nodes that is allowed in the cluster. If the skew percentage is exceeds this configuration property’s value, Qubole attempts on rebalancing the cluster nodes to conform to the configured spot ratio. The default value for this configuration property is 10.

    For example, consider ascm.node-rebalancer-max-extra-stable-nodes.percentage=15, which means that the cluster nodes are rebalanced only if the skew in the spot ratio of running nodes exceeds 15%.