View Cluster Health APIs
- GET /api/v1.3/clusters/(string:id or label)/live_cluster_health
Use this API to view the latest health of a running cluster in a Qubole environment. It is supported with QDS version R57 onwards. It is supported in Cluster API v1.3, v2.0, and v2.1.
Required Role
Users belong to a group that has permission to read a cluster required to invoke this API.
Request API Syntax
curl -i -X GET -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Content-Type: application/json" -H "Accept: application/json"
"https://api.qubole.com/api/v2/clusters/<cluster_id>/live_cluster_health
As the value of <Qubole Environment>
, use the Qubole environment where you have the QDS account. For example,
https://api.qubole.com
is a Qubole environment.
Note
The above syntax uses cluster v2 and the response below are for the cluster API version 2.0.
Sample Response
Hive (as an additional cluster with HiveServer2 disabled and HiveServer2 enabled):
{
"cluster_id": 5,
"cluster_inst_id": 2,
"metrics": {
"captured_at": "2019-10-03T08:22:02Z",
"engine": {
"yarn": {
"memory": "0",
"containers": {
"pending": "0",
"failed": "0",
"killed": "0"
}
}
},
"daemons": {
"hive_metastore": {
"responsiveness_status": "UP",
"liveliness_status": "UP",
"heap": {
"usage_percent": "3.96",
"status": "green"
}
},
"resourcemanager": "UP",
"namenode": "UP"
},
"system": {
"master": {
"cpu_usage": "8.61",
"disk_usage": "70.7",
"spotloss_count": 0
}
}
}
}
Hive (With HiveServer2 enabled on coordinator):
{
"cluster_id": 1,
"cluster_inst_id": 1,
"metrics": {
"captured_at": "2019-10-03T08:22:07Z",
"engine": {
"yarn": {
"memory": "0",
"containers": {
"pending": "0",
"failed": "0",
"killed": "0"
}
}
},
"daemons": {
"hive_metastore": {
"responsiveness_status": "UP",
"liveliness_status": "UP",
"heap": {
"usage_percent": "3.83",
"status": "green"
}
},
"hs2_server": {
"responsiveness_status": "UP",
"liveliness_status": "UP",
"heap": {
"usage_percent": "1.88",
"status": "green"
}
},
"resourcemanager": "UP",
"namenode": "UP"
},
"system": {
"master": {
"cpu_usage": "10.55",
"disk_usage": "70.7",
"spotloss_count": 0
}
}
}}
Presto:
{
"cluster_id": 213,
"cluster_inst_id": 738,
"metrics": {
"captured_at": "2019-10-03T08:05:09Z",
"engine": {
"presto": {
"status": "UP",
"heap": {
"usage_percent": "0.48",
"status": "green"
}
}
},
"daemons": {
"hive_metastore": {
"responsiveness_status": "UP",
"liveliness_status": "UP",
"heap": {
"usage_percent": "15.5",
"status": "green"
}
},
"zeppelin": {
"status": "UP",
"heap": {
"usage_percent": "2.08",
"status": "green"
}
}
},
"system": {
"master": {
"cpu_usage": "20.41",
"disk_usage": "71.5",
"spotloss_count": 0
}
}
}
}
Spark:
{
"cluster_id": 473,
"cluster_inst_id": 737,
"metrics": {
"captured_at": "2019-10-03T08:00:15Z",
"engine": {
"yarn": {
"memory": "0",
"containers": {
"pending": "0",
"failed": "0",
"killed": "0"
}
},
"spark": {}
},
"daemons": {
"hive_metastore": {
"responsiveness_status": "UP",
"liveliness_status": "UP",
"heap": {
"usage_percent": "10.07",
"status": "green"
}
},
"zeppelin": {
"status": "UP",
"heap": {
"usage_percent": "3.68",
"status": "green"
}
},
"resourcemanager": "UP",
"namenode": "UP"
},
"system": {
"master": {
"cpu_usage": "0",
"disk_usage": "73.0",
"spotloss_count": 0
}
}
}
}
Airflow:
{
"cluster_id": 473,
"cluster_inst_id": 737,
"metrics": {
"captured_at": "2019-10-03T08:00:15Z",
"engine": {
"yarn": {
"memory": "0",
"containers": {
"pending": "0",
"failed": "0",
"killed": "0"
}
},
"spark": {}
},
"daemons": {
"hive_metastore": {
"responsiveness_status": "UP",
"liveliness_status": "UP",
"heap": {
"usage_percent": "10.07",
"status": "green"
}
},
"zeppelin": {
"status": "UP",
"heap": {
"usage_percent": "3.68",
"status": "green"
}
},
"resourcemanager": "UP",
"namenode": "UP"
},
"system": {
"master": {
"cpu_usage": "0",
"disk_usage": "73.0",
"spotloss_count": 0
}
}
}
}
Note
YARN-based metrics are only available when Ganglia
is enabled on the cluster.
Cluster Health Services and Metrics Information:
Metrics/Service |
Available On Cluster Type |
---|---|
Binary Metrics (Services) |
|
Hive Metastore |
All |
Name Node |
Hive, Spark |
Resource Manager |
Hive, Spark |
HS2 |
Hive (HS2 enabled on coordinator) |
Zeppelin |
Spark, Presto |
Presto |
Presto |
Bar Metrics (Float) |
|
CPU Usage |
All |
Coordinator Disk Usage |
All |
Spot nodes lost count (Integer) |
All |
Heap Information (All heap metrics are calculated from jstat command) |
|
Hive Metastore Heap |
All |
HS2 Heap |
Hive (HS2 enabled on coordinator) |
Presto Heap |
Presto |
Zeppelin Heap |
Presto, Spark |