Using AWS Spot Instances and Spot Blocks in Qubole Clusters¶
Amazon Web Services (AWS) offers three major types of instances (computing hosts) that are suitable for use as Qubole cluster nodes:
- On-Demand instances
- Spot instances
- Spot Blocks
On-Demand instances are offered at a fixed price, whereas the price for Spot instances varies with market demand and availability, and may well vary over the lifetime of a Qubole cluster. QDS supports configuring Spot Block instances on the new cluster UI page and also through cluster API calls. Spot Blocks are Spot instances that run continuously for a finite duration (1 to 6 hours).
On-Demand versus Spot Instances versus Spot Blocks: Advantages and Disadvantages¶
The advantages of On-Demand instances are stability, and speed of launching. An On-Demand instance comes up very quickly and is likely to remain available for the life of the cluster, helping to ensure that the cluster performs its work smoothly and reliably. The disadvantage is cost: a cluster composed entirely or mostly of On-Demand nodes may be many times more expensive than a similarly-sized cluster composed of Spot nodes.
The obvious advantage of Spot instances is cost, and the potential disadvantage is instability. A Spot instance can cost only a small fraction of the price of an On-Demand instance, but may take a long time to become available (or may never become available), and may be reclaimed by AWS at any time– if the Spot price goes above your maximum bid, or if the supply simply runs out. This in turn can put running jobs in jeopardy, though you can improve the stability of Spot instances by setting your Maximum Bid Price high. For a detailed discussion, see the Qubole blog post Riding the Spotted Elephant.
Spot Blocks are 30 to 45 percent cheaper than On-Demand instances based on the requested duration. They are stable than spot nodes as they are not susceptible to being taken away for the specified duration. However, these nodes certainly get terminated once the duration for which they are requested for is completed. For more details, see AWS spot blocks.
QDS ensures that Spot blocks are acquired at a price lower than On-Demand nodes. It also ensures that autoscaled nodes are acquired for the remaining duration of the cluster. For example, if the duration of a Spot block cluster is 5 hours and there is a need to autoscale at the 2nd hour, Spot blocks are acquired for 3 hours.
Cluster Composition Choices¶
You can choose to create a cluster in any of the following configurations:
- On-Demand nodes only
- Spot nodes only
- A mix of Spot and On-Demand nodes
- Spot blocks only. (See Configuring Spot Blocks).
- Spot blocks for master and minimum number of nodes and Spot nodes for autoscaling
- Spot blocks for master and minimum number of nodes and On-Demand nodes for autoscaling
You can find out about configuring each type of cluster here.
For most purposes, the third option is the best, because it provides a good balance between cost and stability. The fourth option of using Spot blocks is advantageous as the cost is lesser than On-Demand and its availability is fixed for a specific duration.
The remainder of this section focuses on the settings and mechanisms Qubole provides to help safeguard the overall functioning of a cluster that includes Spot nodes.
How You Configure Spot Instances into a Qubole Cluster¶
The critical items when you configure Spot instances into a cluster are the Request Timeout, the Maximum Bid Price, the Qubole Placement Policy option, the Fallback to on demand option, and the Spot Instances Percentage.
- The Request Timeout specifies how many minutes Qubole should keep trying to obtain Spot instances when launching the cluster or adding nodes.
- The Maximum Bid Price is the maximum price you are willing to pay for the instances at launch time, or at any time after that as the Spot price fluctuates. The price is expressed as a percentage of the current On-Demand price. Bear in mind that the bid is the maximum you are offering to pay– your cluster will obtain any suitable instance at or below this price.
- The Qubole Placement Policy option, if selected, causes Qubole to make a best effort to store one replica of each HDFS block on a stable node (normally an On-Demand node, except in the case of a Spot-only cluster). Qubole recommends you select this option to prevent job failures that could occur if all replicas were lost as a result of AWS reclaiming many Spot instances at once.
- The Fallback to on demand is discussed here.
- The Spot Instances Percentage:
- In a mixed cluster this specifies the maximum percentage of auto-scaling nodes that can be Spot instances. Auto-scaling nodes are those that comprise the difference between the Minimum Slave Count and the Maximum Slave Count; Qubole adds and removes these nodes according to the cluster workload, as explained in detail here.
- In a Spot-only cluster, this is always set to 100.
The AWS Availability Zone (AZ) is also important, but in general you should not specify this yourself, but allow Qubole to select it.
The following strategies are all viable; choose according to your priorities:
- Bid very low and set a large Request Timeout. This minimizes cost, ensuring that the cluster obtains instances only when the price is low.
- Bid at about 100% (or just above) and achieve good general cost reduction combined with reliability, using mainly Spot instances but occasionally falling back to more-expensive On-Demand instances.
- Bid very high (say 200%) and be almost sure of getting instances even when they are in short supply.
Configuring a Spot-Only Cluster (Not Recommended)¶
You can configure a Spot-only cluster by choosing Use Stable Spot Nodes when you create the cluster. (You must set the Autoscaling Node Purchasing Option to Spot Instance to enable this option.) This forces the Spot Instances Percentage to 100, meaning all the cluster nodes will be Spot instances, including the core nodes (the Master Node and the nodes comprising the Minimum Slave Count). In this configuration, the core nodes are known as stable instances, because you must specify a high Maximum Bid Price for them – higher than for the auto-scaling nodes; a bid higher than 100% of the On-Demand price significantly improves the chances that the core nodes will not be lost.
Causes for Poor Performance in a Spot-only Cluster¶
100% Spot nodes configuration in a cluster can cause poor performance for these two main reasons:
- If fallback to On-Demand is disabled, this greatly slows down upscaling and can even completely stop upscaling. So, you are running the workload on a very small cluster for a long time.
- You can lose a majority of processing nodes at any time, potentially losing, and requiring to repeat the work done by them up to that time. (But it is even almost equally likely to face this problem if the cluster configuration contains 95% Spot nodes.)
To mitigate poor performance that may occur due to 100% Spot nodes, Qubole recommends you to:
- Use heterogeneous clusters as it is less likely all instance types would experience a Spot-price surge.
- Use a larger minimum-cluster size in the cluster
- Use a smaller Spot percentage in the cluster
- In the cluster settings, keep Fallback to On-Demand Nodes enabled.
Configuring a Mixed Cluster (On-Demand and Spot Nodes)¶
You configure a mixed cluster by doing all of the following:
- Setting the Autoscaling Node Purchasing Option to Spot Instance.
- Setting the Spot Instances Percentage to a number less than 100.
- Leaving Use Stable Spot Nodes unchecked.
This configures a cluster in which the core nodes (the Master Node and the nodes comprising the Minimum Slave Count) are On-Demand instances, and a percentage of the auto-scaling nodes are Spot instances as specified by the Spot Instances Percentage.
For example, if the Minimum Slave Count is 2 and the and the Maximum Slave Count is 10, and you set the Spot Instances Percentage to 50, the resulting cluster will have, at any given time:
- A minimum of 3 nodes: the Master Node plus the Minimum Slave Count, all of them On-Demand instances (the core nodes).
- (Usually) a maximum of 11 nodes, of which up to 4 (50% of the difference between 2 and 10) will be Spot instances, and the remainder On-Demand instances. (The cluster size can occasionally rise above the maximum for brief periods while the cluster is auto-scaling.)
Fallback to On-Demand
In addition to the above settings, you should normally choose Fallback to on demand (check the box). This option, if selected, causes Qubole to launch On-Demand instances if it cannot obtain enough Spot/Spot block instances when adding nodes during autoscaling. This means that the cluster could possibly at times consist entirely of On-Demand nodes – if no Spot nodes are available for your maximum bid price or less. But unless cost is all-important, this is a sensible option to choose because it allows the cluster to do its work even if Spot nodes are not available.
Qubole also falls back to On-Demand nodes when master-and-minimum-number-of-nodes’ cluster composition is spot nodes.
How Qubole Manages the Spot Nodes While the Cluster is Running¶
Qubole’s primary goal in managing cluster resources is productivity– making sure that the work you need to do gets done as efficiently and reliably as possible, and at the lowest cost that is consistent with that goal.
Qubole uses the following mechanisms to help ensure maximum productivity in running clusters that deploy Spot instances:
- The Fallback to on demand option described above.
- Spot Rebalancing. This works in conjunction with Fallback to on demand, ensuring that the cluster conforms to your original specification as much of the time as possible, by swapping out On-Demand nodes in favor of Spot nodes as soon as possible after the Spot price returns to your bid price or below. For a more detailed discussion, see Spot Rebalancing.
- The Qubole Placement Policy described above.
- Intelligent Availability Zone (AZ) Selection. Unless you specify a particular AZ when you configure the cluster, Qubole can automatically select the AZ with the lowest Spot prices for the region and instance type you’ve specified. This capability is supported for non-VPC (Virtual Private Cloud) clusters only at present, and is not enabled by default; you can create a ticket with Qubole Support to enable it for your account.
- Auto-scaling. Auto-scaling ensures that the cluster remains at just the right size for maximum productivity and efficiency, as explained in detail here.
Configuring Spot Blocks¶
You can configure AWS Spot blocks as master node and minimum number of nodes and configure:
- Spot blocks for autoscaling only when the master nodes and minimum number of nodes are Spot blocks
- Spot nodes for autoscaling
- On-Demand nodes for autoscaling