Presto as a Service
Qubole provides Presto as a service for fast, inexpensive, and scalable data processing.
Note
For the latest information on QDS support for Presto, see QDS Components: Supported Versions and Cloud Platforms.
Supported Data Formats
Presto supports the following data formats:
Hive tables in the Cloud and HDFS.
Delimited, CSV, RCFile, JSON, SequenceFile, ORC, Avro, and Parquet. Other file formats are also supported by adding relevant jars to Presto through the Presto Server Bootstrap.
Data-compressed using GZIP.
Hive ACID tables is currently supported in Presto version 317 (beta). For more information, see Using ACID Tables in Presto.
Advantages of QDS Presto Clusters
You can optimize your clusters by choosing the instance type most suitable to your workload.
You can launch clusters in any region or location.
QDS provides Cloud-specific optimizations.
By default, QDS automatically terminates idle clusters to save cost.
QDS starts clusters only when necessary– when a query is run and no Presto cluster is running; otherwise QDS reuses a cluster that is already running.
Autoscaling continuously adjusts the cluster size to the Presto workload.
You can configure the amount of cluster memory allocated for Presto.
A Better User Experience
Multiple QDS users can submit queries to the same Presto cluster.
Query logs and results are always available (use the History tab on the Analyze page of the QDS UI).
QDS provides detailed execution metrics for each Presto query.
Users can create workflows that combine Hadoop jobs, Hive queries, and Presto queries.
Security
QDS can provide table-level security for Hive tables accessed via Presto; to enable it, set hive.security
to
sql-standard
in catalog/hive.properties. See Understanding Qubole Hive Authorization for more information.