Presto as a Service

Qubole provides Presto as a service for fast, inexpensive, and scalable data processing.

Note

For the latest information on QDS support for Presto, see QDS Components: Supported Versions and Cloud Platforms.

Supported Data Formats

Presto supports the following data formats:

  • Hive tables in the Cloud and HDFS.
  • Delimited, CSV, RCFile, JSON, SequenceFile, ORC, Avro, and Parquet. Other file formats are also supported by adding relevant jars to Presto through the Presto Server Bootstrap.
  • Data-compressed using GZIP.
  • Hive ACID tables is currently supported in Presto version 317 (beta). For more information, see Using ACID Tables in Presto.

Advantages of QDS Presto Clusters

  • You can optimize your clusters by choosing the instance type most suitable to your workload.
  • You can launch clusters in any region or location.
  • QDS provides Cloud-specific optimizations.
  • By default, QDS automatically terminates idle clusters to save cost.
  • QDS starts clusters only when necessary– when a query is run and no Presto cluster is running; otherwise QDS reuses a cluster that is already running.
  • Autoscaling continuously adjusts the cluster size to the Presto workload.
  • You can configure the amount of cluster memory allocated for Presto.

A Better User Experience

  • Multiple QDS users can submit queries to the same Presto cluster.
  • Query logs and results are always available (use the History tab on the Analyze page of the QDS UI).
  • QDS provides detailed execution metrics for each Presto query.
  • Users can create workflows that combine Hadoop jobs, Hive queries, and Presto queries.

Security

QDS can provide table-level security for Hive tables accessed via Presto; to enable it, set hive.security to sql-standard in catalog/hive.properties. See Understanding Qubole Hive Authorization for more information.