How QDS Improves Computing Efficiency

Qubole provides advanced capabilities that automatically optimize computing efficiency and help save you money. These include automating cluster sizing and lifecycle management so as to match computing resources with actual usage. Qubole also provides automatic and intelligent purchasing options across different tiers of computing resources.

Note

Qubole does not currently support all of the cluster types discussed below on all Cloud platforms; see QDS Components: Supported Versions and Cloud Platforms.

Cluster Lifecycle Management

QDS automatically shuts down a cluster when there are no more active jobs running on it. You can always shut down the cluster manually, but in practice Qubole has found that the vast majority of clusters are auto-terminated. Automated cluster lifecycle management protects you against accidentally leaving an idle cluster running, saving unnecessary compute charges.

Automated cluster lifecycle management is available for Hadoop (Hive), Spark, and Presto clusters.

For more information about lifecycle management, see Cluster Termination under Introduction to Qubole Clusters.

Autoscaling

QDS dynamically adds or removes cluster nodes to match the cluster workload; this is called autoscaling. Autoscaling is particularly effective for workloads that are less predictable and which involve many users running jobs on the cluster concurrently. For such workloads, Qubole has seen autoscaling provide substantial savings compared to a static-sized cluster. (see Case Study TubeMogul).

Autoscaling is available for Hadoop (Hive), Presto, and Spark clusters.

In addition to autoscaling itself, Qubole services have built-in intelligence to make autoscaling more effective, as described below. In some cases, these additional capabilities are available only for certain cluster types; see Autoscaling in Qubole Clusters for details.

  • Spot-based autoscaling (AWS only). Qubole can use AWS Spot nodes for autoscaling cluster nodes (nodes that do not form part of the cluster core, which consists of the Coordinator Node and the nodes that comprise the Minimum Worker Nodes). Spot nodes are described in detail in the next section.

  • Spot rebalancing (AWS only). Sometimes there are large spikes in the price of Spot instances, and in these cases Qubole can fall back to using On-demand nodes. When that happens, Qubole will re-evaluate the Spot market at the node’s billing boundary and swap in Spot nodes once the price drops; this is called Spot Rebalancing.

  • HDFS-based autoscaling (currently AWS only) . By default, Qubole autoscaling bases provisioning decisions on computing needs and resources. But a new capability, HDFS-based autoscaling, allows clusters to scale on the basis of disk-space capacity as well.

For more information about autoscaling, see Autoscaling in Qubole Clusters.

AWS Spot Nodes (AWS only)

Qubole can use AWS Spot nodes when dynamically adding cluster nodes (see above) or as part of the core minimum nodes for a cluster (not recommended for stability reasons). Spot nodes represent excess capacity for AWS and can be purchased at discounts of up to 90% from the On-Demand price. Unfortunately, Spot nodes can be reclaimed by AWS at any time, meaning job loss is possible.

Spot nodes are priced on average at an 80% discount compared to On-demand pricing.

As with Autoscaling, Qubole provides additional built-in intelligence in the use of Spot nodes, as described below. In some cases, these additional capabilities are available only for certain cluster types; see Autoscaling in Qubole Clusters for details.

  • Qubole Placement Policy. Qubole has multiple pricing options for stable Spot nodes (conservative pricing) and volatile Spot nodes (aggressive pricing). Under the placement policy, Qubole spreads out HDFS storage across stable and volatile nodes, thereby minimizing the risk of job failure because of the loss of a Spot instance.

  • Fallback to On-demand instances after a configurable timeout. There is no guarantee of getting Spot nodes. Qubole can automatically fall back to requesting On-demand nodes if Spot nodes are not available within a configurable timeout period.

  • Intelligent Availability Zone (AZ) Selection. Spot pricing can vary by AZ, sometimes by up to 15-20%. Qubole can automatically select the best AZ in terms of Spot pricing for a cluster’s chosen instance type. This capability is not enabled by default - create a ticket with Qubole Support to enable it for your account.

For more information about how Qubole uses AWS Spot nodes, see: