Understanding the YARN and HDFS Metrics for Monitoring (AWS)
Hadoop 2 (Hive) and Spark clusters support the Datadog monitoring service.
You can configure the Datadog monitoring service at the cluster level as described in Advanced configuration: Modifying Cluster Monitoring Settings.
For more information on configuring the Datadog monitoring service at the account level in Control Panel > Account Settings, see Configuring your Access Settings using IAM Keys or Managing Roles.
Qubole also provides a default dashboard on Datadog and alerts to monitor Hadoop 2 (Hive) clusters. Default Dashboard for YARN and HDFS Metrics describes a sample default dashboard.
If you want to customize the threshold values or alerts about other metrics, you can set such alerts/values. For information on how to create alerts and configure email notifications, see the Datadog Alerts description.
This section describes:
YARN Metrics
This table describes the YARN metrics that are sent to Datadog. Log in to the Datadog account to see these metrics.
Metric |
Description |
yarn.QueueMetrics.AppsCompleted |
It denotes the number of completed applications. |
yarn.QueueMetrics.AppsPending |
It denotes the number of pending applications. |
yarn.QueueMetrics.AppsRunning |
It denotes the number of running applications. |
yarn.QueueMetrics.AppsFailed |
It denotes the number of failed applications. |
yarn.QueueMetrics.AppsKilled |
It denotes the number of killed applications. |
yarn.QueueMetrics.ReservedMB |
It denotes the size of the reserved memory. |
yarn.QueueMetrics.AvailableMB |
It denotes the size of the available memory in Mebibytes. |
yarn.QueueMetrics.AllocatedMB |
It denotes the size of the allocated memory in Mebibytes. |
yarn.QueueMetrics.ReservedVCores |
It denotes the number of reserved virtual cores. |
yarn.QueueMetrics.AvailableVCores |
It denotes the number of available virtual cores. |
yarn.QueueMetrics.AllocatedVCores |
It denotes the number of allocated virtual cores. |
yarn.NodeManagerMetrics.ContainersFailed |
It denotes the number of containers that have failed. |
yarn.NodeManagerMetrics.ContainersRunning |
It denotes the number of running containers. |
yarn.NodeManagerMetrics.ContainersKilled |
It denotes the number of containers that are killed. |
yarn.NodeManagerMetrics.ContainersCompleted |
It denotes the number of containers that are completed. |
yarn.QueueMetrics.AllocatedContainers |
It denotes the number of allocated containers. |
yarn.QueueMetrics.ReservedContainers |
It denotes the number of reserved containers. |
yarn.ClusterMetrics.NumActiveNMs |
It denotes the number of active NodeManagers. |
yarn.ClusterMetrics.NumDecommissionedNM |
It denotes the number of decommissioned NodeManagers. |
yarn.ClusterMetrics.NumDecommissioningNMs |
It denotes the number of decommissioning NodeManagers. |
yarn.ClusterMetrics.NumLostNMs |
It denotes the number of NodeManagers that are lost. |
yarn.ClusterMetrics.NumRebootedNMs |
It denotes the number of rebooted NodeManagers. |
yarn.ClusterMetrics.NumUnhealthyNMs |
It denotes the number of unhealthy NodeManagers. |
HDFS Metrics
Metric |
Description |
dfs.FSNamesystem.CapacityTotal |
It denotes the total disk capacity in bytes. |
dfs.FSNamesystem.CapacityUsed |
It denotes the disk usage in bytes. |
dfs.FSNamesystem.CapacityRemaining |
It denotes the remaining disk space left in bytes. |
dfs.FSNamesystem.CapacityUsedGB |
It denotes the disk usage in Gigabytes. |
dfs.FSNamesystem.CapacityTotalGB |
It denotes the total disk capacity in Gigabytes. |
dfs.FSNamesystem.TotalLoad |
It denotes the total load on the file system. |
dfs.FSNamesystem.BlocksTotal |
It denotes the total number of blocks. |
dfs.FSNamesystem.FilesTotal |
It denotes the total number of files. |
dfs.FSNamesystem.MissingBlocks |
It denotes the number of missing blocks. |
dfs.FSNamesystem.CorruptBlocks |
It denotes the number of corrupt blocks. |
dfs.FSNamesystem.PendingReplicationBlocks |
It denotes the number of blocks pending replication. |
dfs.FSNamesystem.UnderReplicatedBlocks |
It denotes the number of under replicated blocks. |
dfs.FSNamesystem.ScheduledReplicationBlocks |
It denotes the number of blocks scheduled for replication. |
dfs.FSNamesystem.PendingDeletionBlocks |
It denotes the number of pending deletion blocks. |
Default Dashboard for YARN and HDFS Metrics
QDS provides a default dashboard with these metrics:
DFS Used Capacity
Here is a sample default dashboard that contains YARN/HDFS metrics.