Health Checks for Clusters¶
This section explains the various health checks configured for the clusters.
Cluster HDFS Disk Utilization¶
This alert checks the free space allotted to HDFS and sends an alert if the free space is lower than a configurable limit.
Node Disk Utilization¶
This alert checks the free space allotted to HDFS on each node of the cluster and sends an alert if the free space is lower than a configurable limit.
Simple Hadoop Job Probe¶
This alert probes a simple end-to-end hadoop job in the cluster to check the overall health of the cluster.
Describing Cluster Health Data from the UI¶
The Cluster Health data is available (on AWS only) for you only when the cluster is up. Create a ticket with Qubole Support.
Cluster Health data appears after Qubole Support enables the feature. Until then, the QDS UI prompts you to try again later as the cluster health is not available.
When the Cluster Health data appears on the QDS UI, it displays the status of the services and metrics. The services are displayed under the Service Status section in binary values (red and green). The green color indicates that the service is running properly whereas the red color denotes that it is not running in an optimal state. Under the Metrics section, the status of the metrics are displayed in percentage (%). The percentage bar becomes red when the CPU and Disk Usage metrics become 90% or more.
Metrics and Services Available on Clusters¶
YARN-based metrics are only available when
Ganglia is enabled on the cluster.
|Metrics/Service||Available On Cluster Type|
|Binary Metrics (Services)|
|Name Node||Hive, Spark|
|Resource Manager||Hive, Spark|
|HS2||Hive (HS2 enabled on master)|
|Bar Metrics (Float)|
|CPU Usage | All|
|Master Disk Usage||All|
|Spot nodes lost count (Integer)||All|
|Heap Information (All heap metrics are calculated from jstat command)|
|Hive Metastore Heap||All|
|HS2 Heap||Hive (HS2 enabled on master)|
|Zeppelin Heap||Presto, Spark|