SSD Caching

Note

SSD caching is currently available only on AWS.

Cloud storage has many benefits. It is great for storing large amount of data at low cost. However, Amazon S3’s bandwidth is not high enough to support interactive querying. The new generation of Amazon instance types come with SSD volumes. Some machine types also come with large amount of memory (r3 instance types) per node. Qubole has built a caching framework in Presto to take advantage of this memory hierarchy to provide interactive query performance over large amounts of data in Amazon S3.

Architecture

The following figure shows the architecture of the caching solution.

../../_images/caching-arch.png

As part of query execution, Presto, like Hadoop, performs split computation. Each worker node is assigned with one or more splits. Let us assume that one split is one file. Presto’s scheduler assigns splits to worker nodes randomly. Qubole modified the scheduling mechanism to assign split to a node based on a hash of the filename. This assures us that if the same file is to be read for another query that split is executed in the same node. This gives spatial locality. Qubole has modified the S3 filesystem code to cache files in local disks as part of a query execution. For example, in the above figure, if a query wants to read file 2, it is read by worker node 1 from local disk instead of S3 which is a lot faster. The cache contains logic for eviction and expiry. Some instance types contain multiple SSD volumes and Qubole stripes data across them.

Qubole uses consistent hashing to handle dynamic addition or removal of nodes due to auto-scaling. This ensures that Qubole maintains the advantage of SSD cache already built in old news as much as possible.

Experimental Results

To test this feature, Qubole has generated a TPC-DS scale 10000 data on a 20 c3.8xlarge node cluster. Qubole has used delimited/zlib and ORC/zlib formats. The ORC version was not sorted. Here are table statistics.

Table Rows Text Text, zlib ORC, zlib
store_sales 28 billion 3.6TB 1.4TB 1.1TB
customer 65 million 12 GB 3.1 GB 2.5 GB

Qubole has used the following queries to measure performance improvements. These queries are representatives of common query patterns from analysts.

ID Query Description
Q1 select * from store_sales where ss_customer_sk=1000; Selects ~400 rows
Q2 select ss_store_sk, sum(ss_quantity) as cnt from store_sales group by ss_store_sk order by cnt desc limit 5; Top 5 stores by sales
Q3 select sum(ss_quantity) as cnt from store_sales ss join customer c on (ss.ss_customer_sk = c.c_customer_sk) where c.c_birth_year < 1980; Quantity sold to customers born before 1980
../../_images/caching-results.png

Txt-NoCache means using Txt format with caching feature disabled. The Txt-NoCache case suffers from two problems, inefficient storage format and slow access. Switching to caching provides a good performance improvement. However, the biggest gains are realized when caching is used in conjunction with the ORC format. There is a 10-15x performance improvement by switching to ORC and using Qubole’s caching feature. Results show that queries that take many minutes now take a few seconds, thus, benefiting the analyst use-case.

Refer to Understanding the Presto Engine Configuration section for details on how to configure SSD caching in a cluster effectively.