hdfs_full_on_cluster
Alert Name: hdfs_full_on_cluster
Alert Condition: The condition that triggers the alert is avg(last_5m):avg:dfs.FSNamesystem.CapacityUsedGB{*} by {host} / avg:dfs.FSNamesystem.CapacityTotalGB{*} by {host} > 0.9
.
Alert Explanation: The alert indicates that the average HDFS capacity on the cluster has run out of free capacity in the last 5 minutes. The ratio of the used HDFS capacity over the free HDFS capacity is greater than 90% in the last 5 minutes and thus this alert is triggered.
Resolution:
Step 1
Log into the cluster to check the disk space and trace the files/folders that have consumed higher disk space. Use the following commands to the check available free disk space in HDFS:
hdfs dfs -df -h
hdfs dfsadmin -report
Step 2
View the size of the files and directories in a specific directory with the following command:
hdfs dfs -du -h <URI>
hdfs dfs -du -h /
Step 3
After tracing the files that consume more space in the HDFS, you can zip such files and archive or remove them as required.