How Qubole Improves Computing Efficiency¶
Qubole provides advanced capabilities that automatically optimize computing efficiency and help save you money. These include automating cluster sizing and lifecycle management so as to match computing resources with actual usage. Qubole also provides automatic and intelligent purchasing options across different tiers of computing resources.
Qubole does not currently support all of the cluster types discussed below on all Cloud platforms; see QDS Components: Supported Versions and Cloud Platforms.
Cluster Lifecycle Management¶
Qubole automatically shuts down a cluster when there are no more active jobs running on it. You can always shut down the cluster manually, but in practice Qubole has found that the vast majority of clusters are auto-terminated. Automated cluster lifecycle management protects you against accidentally leaving an idle cluster running, saving unnecessary compute charges. This feature is available for Hadoop, Spark, and Presto clusters.
For more information about lifecycle management, see Cluster Termination under Introduction to Qubole Clusters.
Qubole dynamically adds or removes cluster nodes to match the cluster workload; this is called auto-scaling. Auto-scaling is particularly effective for workloads that are less predictable and which involve many users running jobs on the cluster concurrently. For such workloads, Qubole has seen auto-scaling provide substantial savings compared to a static-sized cluster (see Case Study TubeMogul). Auto-scaling is available for Hadoop, Spark, Presto, and Tez clusters.
In addition to auto-scaling itself, Qubole services have built-in intelligence to make auto-scaling more effective, as described below. In some cases, these additional capabilities are available only for certain cluster types; see Auto-scaling in Qubole Clusters for details.
- Spot-based auto-scaling (AWS only). Qubole can use AWS Spot nodes for auto-scaling cluster nodes (nodes that do not form part of the cluster core, which consists of the Master Node and the nodes that comprise the Minimum Slave Count). Spot nodes are described in detail in the next section.
- Spot rebalancing (AWS only). Sometimes there are large spikes in the price of Spot instances, and in these cases Qubole can fall back to using On-Demand nodes. When that happens, Qubole will re-evaluate the Spot market at the node’s billing boundary and swap in Spot nodes once the price drops; this is called Spot Rebalancing.
- HDFS-based auto-scaling (currently AWS only) . By default, Qubole auto-scaling bases provisioning decisions on computing needs and resources. But a new capability, HDFS-based auto-scaling, allows clusters to scale on the basis of disk-space capacity as well.
For more information about auto-scaling, see: Auto-Scaling in Qubole Clusters.
AWS Spot Nodes (AWS only)¶
Qubole can use AWS Spot nodes when dynamically adding cluster nodes (see above) or as part of the core minimum nodes for a cluster (not recommended for stability reasons). Spot nodes represent excess capacity for AWS and can be purchased at discounts of up to 90% from the On-Demand price. Unfortunately, Spot nodes can be reclaimed by AWS at any time, meaning job loss is possible.
Spot nodes are priced on average at an 80% discount compared to on-demand pricing.
As with Auto-scaling, Qubole provides additional built-in intelligence in the use of Spot nodes, as described below. In some cases, these additional capabilities are available only for certain cluster types; see Auto-scaling in Qubole Clusters for details.
- Qubole Placement Policy. Qubole has multiple pricing options for stable Spot nodes (conservative pricing) and volatile Spot nodes (aggressive pricing). Under the placement policy, Qubole spreads out HDFS storage across stable and volatile nodes, thereby minimizing the risk of job failure because of the loss of a Spot instance.
- Fallback to On-Demand instances after a configurable timeout. There is no guarantee of getting Spot nodes. Qubole can automatically fall back to requesting On-Demand nodes if Spot nodes are not available within a configurable timeout period.
- Intelligent Availability Zone (AZ) Selection. Spot pricing can vary by AZ, sometimes by up to 15-20%. Qubole can automatically select the best AZ in terms of Spot pricing for a cluster’s chosen instance type. This capability is not enabled by default - create a ticket with Qubole Support to enable it for your account.
For more information about how Qubole uses AWS Spot nodes, see: