Should I use Presto or Hive?
While Presto may be the better choice for most scenarios, one should not discount Hive as there is always a use case too demanding for Presto.
As Presto has a limitation on the maximum amount of memory each task can store, it fails if the query requires a significant amount of memory. While this error handling logic (or a lack thereof) is acceptable for interactive queries, it is not suitable for daily/weekly reports that must run reliably. Hive may be a better alternative for such tasks.
Hive |
Presto |
Optimized for batch processing of large ETL jobs and batch SQL queries on huge data sets. |
Used for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. |
Mature SQL – ANSI SQL. |
Less mature SQL (still ANSI compliant). |
Easily extensible. |
Some extensibility, but limited compared to Hive. |
Optimized for query throughput. |
Optimized for latency. |
Needs more resources per query. |
Resource-efficient. |
Suitable for large fact-to-fact joins. |
Optimized for star schema joins (1 large fact table and many smaller dimension tables). |
Suitable for large data aggregations. |
Interactive queries and quick data exploration. |
Rich ecosystem (plenty of resources online) |
Less rich ecosystem (but now improving with big users such as Facebook, Netflix). |