Presto FAQs

  1. How is Presto different from Hive?
  2. How is Qubole’s Presto different from open-source Presto?
  3. Where do I find Presto logs?
  4. Why are new nodes not being used by my query during upscaling?
  5. Where can I find different Presto metrics for monitoring?
  6. Where can I find the Presto Server Bootstrap logs?

How is Presto different from Hive?

As a user, there are certain differences that you should be aware about Presto and Hive, even though they are able to execute SQL-like queries.

Presto:

  • Does not support User-defined functions (UDFs). However, Presto has a large number of built-in UDFs. Qubole provides additional UDFs, which can be added only before the cluster startup and runtime UDF additions such as Hive are not supported.
  • Does not support JOIN ordering. Ensure that a smaller table is to the right of the JOIN token.

How is Qubole’s Presto different from open-source Presto?

Note

Presto is not currently supported on all Cloud platforms; see QDS Components: Supported Versions and Cloud Platforms.

While Qubole’s Presto offering is heavily based on open-source Presto, there are a few differences. Qubole’s Presto:

  • Supports inserting data into S3 directories
  • Supports INSERT OVERWRITES
  • Supports auto-scaling clusters
  • Supports Rubix to cache data from the Cloud on cluster storage, improving performance
  • Supports GZIP compression
  • Supports JDBC/ODBC through Qubole drivers
  • Supports Zeppelin and you can create Presto notebooks to run paragraphs as described in Using Different Types of Notebook.
  • Supports data traffic encryption among the Presto cluster nodes
  • Supports additional connectors such as Kinesis and SerDes such as AVRO and Openx JSON

Where do I find Presto logs?

  • The master cluster node’s logs are located at: DEFLOC/logs/presto/cluster_inst_id/master/
  • The worker cluster node’s logs are located at: DEFLOC/logs/presto/cluster_inst_id/nodeIP/

Where:

  • DEFLOC refers to the default location of an account.

  • cluster_inst_id is the cluster instance ID. It is the latest folder in the location, DEFLOC/logs/presto. You can also get it by running a Presto command. When you run a Presto command, the log location is reported under the Logs tab; for example, on AWS you’ll see something like this:

    Log location: s3://mydata.com/trackdata/logs/logs/presto/95907
    Started Query: 20151110_092450_00096_bucas Query Tracker
    Query: 20151110_092450_00096_bucas Progress: 0%
    Query: 20151110_092450_00096_bucas Progress: 0%
    

    95907 is the cluster instance ID; there are sub-directories for the master and worker nodes. In Azure blob storage the path would be something like wasb://mycontainer@myaccount.blob.core.windows.net/logs/presto/95907, and in Azure Data Lake storage (ADLS), the path would be something like adl://mydatalake.azuredatalakestore.net/logs/presto/95907.

Why are new nodes not being used by my query during upscaling?

New nodes are available only to certain operations (such as TableScans and Partial Aggregations) of queries already in progress when the nodes are added. For more information, see this explanation of how auto-scaling works in a Presto cluster.

Where can I find different Presto metrics for monitoring?

Understanding the Presto Metrics for Monitoring describes the list of metrics that can be seen on the Datadog monitoring service. It also describes the abnormalities and actions that you can perform to handle abnormalities.

Where can I find the Presto Server Bootstrap logs?

An ec2-user can see the Presto Server Bootstrap logs in /media/ephemeral0/presto/var/log/bootstrap.log. The QDS account admin can see the Presto Server Bootstrap logs by logging into the cluster when the Customer Public SSH Key is configured in the cluster’s security settings. For more information, see Advanced configuration: Modifying Security Settings (AWS).

For information on how to log into the clusters, see: