Presto as a Service

Qubole provides Presto as a service for fast, inexpensive, and scalable data processing.

Note

For the latest information on QDS support for Presto, see QDS Components: Supported Versions and Cloud Platforms.

Supported Data Formats

Presto supports the following data formats:

  • Hive tables in the Cloud and HDFS.

  • Delimited, CSV, RCFile, JSON, SequenceFile, ORC, Avro, and Parquet. Other file formats are also supported by adding relevant jars to Presto through the Presto Server Bootstrap.

  • Data-compressed using GZIP.

  • Hive ACID tables is currently supported in Presto version 317 (beta). For more information, see Using ACID Tables in Presto.

Advantages of QDS Presto Clusters

  • You can optimize your clusters by choosing the instance type most suitable to your workload.

  • You can launch clusters in any region or location.

  • QDS provides Cloud-specific optimizations.

  • By default, QDS automatically terminates idle clusters to save cost.

  • QDS starts clusters only when necessary– when a query is run and no Presto cluster is running; otherwise QDS reuses a cluster that is already running.

  • Autoscaling continuously adjusts the cluster size to the Presto workload.

  • You can configure the amount of cluster memory allocated for Presto.

A Better User Experience

  • Multiple QDS users can submit queries to the same Presto cluster.

  • Query logs and results are always available (use the History tab on the Analyze page of the QDS UI).

  • QDS provides detailed execution metrics for each Presto query.

  • Users can create workflows that combine Hadoop jobs, Hive queries, and Presto queries.

Security

QDS can provide table-level security for Hive tables accessed via Presto; to enable it, set hive.security to sql-standard in catalog/hive.properties. See Understanding Qubole Hive Authorization for more information.