Presto Metrics on the Default Datadog Dashboard

Qubole Presto supports Datadog monitoring and it also supports metrics on Datadog dashboards.

Note

The feature to use the Datadog UI is not available by default. Create a ticket with Qubole Support to enable this feature on the QDS account.

When Datadog monitoring is configured on a Presto cluster, the metrics of an active cluster are displayed on a default Datadog dashboard. The default Datadog dashboad metrics are:

  • presto.MaxYoungGenGC-Time
  • presto.AveragePlanningTime
  • presto.Workers
  • presto.requestFailures
  • presto.RUNNING-Queries
  • presto.FINISHED-Queries
  • presto.FAILED-Queries
  • presto.bytesReadPerSecondPerQuery

Note

Understanding the Presto Metrics for Monitoring provides more details on the metrics and the actions that you can do to remove the cause of errors.

As a prerequisite, you must enable Datadog monitoring on the Presto cluster.

Enabling Datadog

Advanced configuration: Modifying Cluster Monitoring Settings describes how to enable Datadog through the cluster UI. Add Datadog API and APP tokens in the Advanced Configuration of the Presto cluster. Create a New Cluster describes how to configure Datadog through a API call.

Here is an example that illustrates Datadog tokens on the cluster UI.

../../_images/DatadogTokens.png

You can enable Datadog monitoring in Control Panel > Account Settings which would apply the settings on all clusters of that account. For information on enabling Datadog at account level, see Configuring your Access Settings using IAM Keys or Managing Roles.

Viewing the Default Datadog Dashboard

After enabling Datadog on the QDS account/cluster, the Datadog metrics related to Presto are displayed on the Datadog UI. For example, run a Presto query on the QDS UI (or API).

Here is an example of a Presto query.

../../_images/DatadogPrestoQuery.png

Log into the Datadog and navigate to Dashboards. You can find the Presto dashboards in the list. Here is an illustration of the Datadog dashboards.

../../_images/PrestoDDdashboard.png

Click the default Datadog which is named with this convention - Account <account owner> Cluster <label> (<cluster ID>). You can see the default Datadog metrics. Here is an example of the Presto metrics on the default Datadog dashboard.

../../_images/DatadogPrestoMetrics.png

Default Alerts as Set by QDS

Qubole has set these alerts by default:

  • If the average value of bytesReadPerSecondPerQuery in the cluster is 0 in the last minute, then you receive an alert.
  • If there are more than 100 requestFailures in last 1 minute on an average, then you receive an alert.
  • If the master CPU utilization goes beyond 80%, then you receive an alert.
  • If the the presto.AveragePlanningTime is greater than 2 minutes.

If you want to customize the threshold values or alerts about other metrics, you can set such alerts/values. For information on how to create alerts and configure email notifications, see the Datadog Alerts description.

Here is an example of the requestFailures alert.

../../_images/ExampleDDAlert.png

Understanding the Presto Metrics for Monitoring provides more details on the metrics and the actions that you can do to remove the cause of errors.