View Cluster Metrics

GET /api/v1.3/clusters/<Cluster ID> or <Cluster label>/metrics

Note

The metrics are available for clusters running with Ganglia monitoring enabled. Qubole does not support retrieving multiple metrics from a single API call.

Required Role

The following users can make this API call:

  • Users who belong to the system-user or system-admin group.

  • Users who belong to a group associated with a role that allows viewing a cluster’s metrics. See Managing Groups and Managing Roles for more information.

Parameters

Parameter

Description

metric

The metric to monitor. It is possible to get metric values for a particular node or aggregated across cluster

interval

The interval for which the metric values are required. Valid value for interval can be hour, 2hr, 4hr, day, week, month or year. Default interval value is hour.

hostname

The hostname for which the metric values are required. Valid value is the private DNS name of the host. See Per-host Metrics below. If not specified, for certain metrics, API returns the metric value aggregated across the cluster. See Aggregate Cluster Metrics below.

Note

Parameters marked in bold are mandatory. Others are optional and have default values.

Per-host Metrics

Metrics related to a host can be collected with hostname parameter value specified as the internal DNS name of the instance (with format ip-A-B-C-D.ec2.internal). Some of the useful metrics are:

System Metrics

  • cpu_user: Percentage of CPU utilization while executing at the user level

  • cpu_system: Percentage of CPU utilization while executing at the system level

  • cpu_idle: Percentage of time CPU were idle

  • disk_free: Total free disk space

  • mem_free: Amount of available memory

  • bytes_in: Number of bytes in per second

  • bytes_out: Number of bytes out per second

Examples

Examples to get Cluster Metrics of an Hadoop 2 Cluster with 21144 as its Cluster ID

curl -i -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Content-Type: application/json" -H "Accept: application/json" \
-G \
-d metric=yarn.NodeManagerMetrics.ContainersRunning \
-d hostname=<hostname> \
-d interval=hour \
 https://api.qubole.com/api/v1.3/clusters/21144/metrics
curl -i -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Content-Type: application/json" -H "Accept: application/json" \
-G \
-d metric=yarn.NodeManagerMetrics.ContainersCompleted \
-d hostname=<hostname> \
-d interval=hour \
 https://api.qubole.com/api/v1.3/clusters/21144/metrics
curl -i -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Content-Type: application/json" -H "Accept: application/json" \
-G \
-d metric=yarn.NodeManagerMetrics.ContainersKilled \
-d hostname=<hostname> \
-d interval=hour \
 https://api.qubole.com/api/v1.3/clusters/21144/metrics

In the above example, replace <n-n-n-n> with the host IP address and <name> with the defined host name.

Note

The above syntax uses https://api.qubole.com as the endpoint. Qubole provides other endpoints to access QDS that are described in Supported Qubole Endpoints on Different Cloud Providers.

Aggregate Cluster Metrics

Some of the system metrics can be aggregated across cluster to get a broader view of the resource across all instances in the cluster. The hostname parameter should not be specified for aggregate cluster metrics.

Some of the useful aggregate cluster metrics are:

  • cpu_report : Aggregate report of CPU utilization percentage

  • mem_report : Aggregate report of memory usage in bytes

  • load_report : Aggregate report with current load, number of processes running processes, nodes and CPU count

  • network_report: Aggregate report with network traffic in and out of the cluster nodes

Example

curl -i -H "X-AUTH-TOKEN: ${X_AUTH_TOKEN}" -H "Content-Type: application/json" -H "Accept: application/json" \
     -G \
     -d metric=cpu_report \
     -d interval=hour \
     https://api.qubole.com/api/v1.3/clusters/${CLUSTER_ID}/metrics

Response:

[
 {"metric":"User\\g","interval":"hour","datapoints":[[58.689508632,1427752170],[57.445152722,1427752185],[56.650996016,1427752200],[53.899468792,1427752215], ..., [43.448339973,1427755710],[44.044090305,1427755725],[42.478220452,1427755740],["NaN",1427755755]],"hostname":"null"},
 {"metric":"Nice\\g","interval":"hour","datapoints":[[0.010491367862,1427752170],[0.0088977423639,1427752185],[0.0024701195219,1427752200],[0.0030544488712,1427752215], ..., [0,1427755710],[0,1427755725],[0,1427755740],["NaN",1427755755]],"hostname":"null"},
 {"metric":"System\\g","interval":"hour","datapoints":[[6.4996015936,1427752170],[6.3784860558,1427752185],[6.2476494024,1427752200],[5.985126162,1427752215], ..., [5.5504648074,1427755710],[5.5448871182,1427755725],[5.3686586985,1427755740],["NaN",1427755755]],"hostname":"null"},
 {"metric":"Wait\\g","interval":"hour","datapoints":[[0.44156706507,1427752170],[0.45962815405,1427752185],[0.41856573705,1427752200],[0.40849933599,1427752215], ..., [0.88273572377,1427755710],[0.78273572377,1427755725],[0.66613545817,1427755740],["NaN",1427755755]],"hostname":"null"},
 {"metric":"Steal\\g","interval":"hour","datapoints":[[0.096812749004,1427752170],[0.096679946879,1427752185],[0.096414342629,1427752200],[0.096812749004,1427752215], ..., [0.099601593625,1427755710],[0.09973439575,1427755725],[0.1,1427755740],["NaN",1427755755]],"hostname":"null"},
 {"metric":"Idle\\g","interval":"hour","datapoints":[[34.283532537,1427752170],[35.633333333,1427752185],[36.605179283,1427752200],[39.631341301,1427752215], ..., [50.014741036,1427755710],[49.515139442,1427755725],[51.369189907,1427755740],["NaN",1427755755]],"hostname":"null"}
]

Response of Metrics API

The response is a list of dictionaries of sub-metrics, which forms the CPU report. Each sub-metric has an interval and datapoints. Datapoints is a list of 2-elements array. The first element is the value of sub-metric and other element is UNIX time. The UNIX time is time in seconds since Jan 01 1970 (UTC). Datapoints has values for the hour interval aggregated over a 15-seconds slot.