hdfs_full_on_clusterΒΆ

Alert Name: hdfs_full_on_cluster

Alert Condition: The condition that triggers the alert is avg(last_5m):avg:dfs.FSNamesystem.CapacityUsedGB{*} by {host} / avg:dfs.FSNamesystem.CapacityTotalGB{*} by {host} > 0.9.

Alert Explanation: The alert indicates that the average HDFS capacity on the cluster has run out of free capacity in the last 5 minutes. The ratio of the used HDFS capacity over the free HDFS capacity is greater than 90% in the last 5 minutes and thus this alert is triggered.

Resolution:

Step 1

Log into the cluster to check the disk space and trace the files/folders that have consumed higher disk space. Use the following commands to the check available free disk space in HDFS:

  1. hdfs dfs -df -h
  2. hdfs dfsadmin -report

Step 2

View the size of the files and directories in a specific directory with the following command:

  • hdfs dfs -du -h <URI>
  • hdfs dfs -du -h /

Step 3

After tracing the files that consume more space in the HDFS, you can zip such files and archive or remove them as required.