Understanding the YARN and HDFS Metrics for Monitoring (AWS)¶
Hadoop 2 (Hive) and Spark clusters support the Datadog monitoring service.
You can configure the Datadog monitoring service at the cluster level as described in Advanced configuration: Modifying Cluster Monitoring Settings.
For more information on configuring the Datadog monitoring service at the account level in Control Panel > Account Settings, see Configuring your Access Settings using IAM Keys or Managing Roles.
Qubole also provides a default dashboard on Datadog and alerts to monitor Hadoop 2 (Hive) clusters. Default Dashboard for YARN and HDFS Metrics describes a sample default dashboard.
If you want to customize the threshold values or alerts about other metrics, you can set such alerts/values. For information on how to create alerts and configure email notifications, see the Datadog Alerts description.
This section describes:
YARN Metrics¶
This table describes the YARN metrics that are sent to Datadog. Log in to the Datadog account to see these metrics.
Metric | Description |
---|---|
yarn.QueueMetrics.AppsCompleted | It denotes the number of completed applications. |
yarn.QueueMetrics.AppsPending | It denotes the number of pending applications. |
yarn.QueueMetrics.AppsRunning | It denotes the number of running applications. |
yarn.QueueMetrics.AppsFailed | It denotes the number of failed applications. |
yarn.QueueMetrics.AppsKilled | It denotes the number of killed applications. |
yarn.QueueMetrics.ReservedMB | It denotes the size of the reserved memory. |
yarn.QueueMetrics.AvailableMB | It denotes the size of the available memory in Mebibytes. |
yarn.QueueMetrics.AllocatedMB | It denotes the size of the allocated memory in Mebibytes. |
yarn.QueueMetrics.ReservedVCores | It denotes the number of reserved virtual cores. |
yarn.QueueMetrics.AvailableVCores | It denotes the number of available virtual cores. |
yarn.QueueMetrics.AllocatedVCores | It denotes the number of allocated virtual cores. |
yarn.NodeManagerMetrics.ContainersFailed | It denotes the number of containers that have failed. |
yarn.NodeManagerMetrics.ContainersRunning | It denotes the number of running containers. |
yarn.NodeManagerMetrics.ContainersKilled | It denotes the number of containers that are killed. |
yarn.NodeManagerMetrics.ContainersCompleted | It denotes the number of containers that are completed. |
yarn.QueueMetrics.AllocatedContainers | It denotes the number of allocated containers. |
yarn.QueueMetrics.ReservedContainers | It denotes the number of reserved containers. |
yarn.ClusterMetrics.NumActiveNMs | It denotes the number of active NodeManagers. |
yarn.ClusterMetrics.NumDecommissionedNM | It denotes the number of decommissioned NodeManagers. |
yarn.ClusterMetrics.NumDecommissioningNMs | It denotes the number of decommissioning NodeManagers. |
yarn.ClusterMetrics.NumLostNMs | It denotes the number of NodeManagers that are lost. |
yarn.ClusterMetrics.NumRebootedNMs | It denotes the number of rebooted NodeManagers. |
yarn.ClusterMetrics.NumUnhealthyNMs | It denotes the number of unhealthy NodeManagers. |
HDFS Metrics¶
Metric | Description |
---|---|
dfs.FSNamesystem.CapacityTotal | It denotes the total disk capacity in bytes. |
dfs.FSNamesystem.CapacityUsed | It denotes the disk usage in bytes. |
dfs.FSNamesystem.CapacityRemaining | It denotes the remaining disk space left in bytes. |
dfs.FSNamesystem.CapacityUsedGB | It denotes the disk usage in Gigabytes. |
dfs.FSNamesystem.CapacityTotalGB | It denotes the total disk capacity in Gigabytes. |
dfs.FSNamesystem.TotalLoad | It denotes the total load on the file system. |
dfs.FSNamesystem.BlocksTotal | It denotes the total number of blocks. |
dfs.FSNamesystem.FilesTotal | It denotes the total number of files. |
dfs.FSNamesystem.MissingBlocks | It denotes the number of missing blocks. |
dfs.FSNamesystem.CorruptBlocks | It denotes the number of corrupt blocks. |
dfs.FSNamesystem.PendingReplicationBlocks | It denotes the number of blocks pending replication. |
dfs.FSNamesystem.UnderReplicatedBlocks | It denotes the number of under replicated blocks. |
dfs.FSNamesystem.ScheduledReplicationBlocks | It denotes the number of blocks scheduled for replication. |
dfs.FSNamesystem.PendingDeletionBlocks | It denotes the number of pending deletion blocks. |
Default Dashboard for YARN and HDFS Metrics¶
QDS provides a default dashboard with these metrics:
- Apps
- Containers
- DFS Used Capacity
Here is a sample default dashboard that contains YARN/HDFS metrics.