hive-hs2-cms-gc-time

This runbook shows steps to take when CMS garbage collection time is excessive.

Alert Name: HS2 CMS GC time

Alert Message: “HS2 GC time over 7 seconds”

Alert Explanation: The alert indicates that CMS garbage collection time is greater than seven seconds.

Resolution:

  • Check the dashboards HS2 Memory Usage and HS2 GC Time. Look for hive.hs2.memory.pools_CMS-Perm-Gen_usage trend along with heap trends. If this alert appears repeatedly, it indicates that the JVM is working most of the time on garbage collection and is still not able to free up memory.
  • Based on the above, configure or alter your workloads accordingly with the help of Dev teams.
  • Restart the HiveServer2 process if failures are not contained.

Logs:

  • HS2 logs are available on the coordinator node: /media/ephemeral0/logs/hive2.1.1/hive.log
  • Look for any evident errors(do basic grep and count of errors).
  • Look at the dashboards defined above. (title is the name of the dashboard).

Restart of Process:

  • sudo monit summary to check the status of the process.
  • sudo monit stop hs2 to stop the process.
  • sudo monit start hs2 to start the process.