Hadoop

The new features and key enhancements in Hadoop are:

Other enhancements and bug fixes are listed in:

Gracefully Terminating Shell CLI Commands

HADTWO-2522: Qubole plans to gracefully terminate shellcli commands if the connection to the coordinator node fails. Gradual Rollout | Cluster Restart Required

After the feature is enabled, Qubole waits for a fixed timeout (120 seconds) to connect with the cluster logs’ location. Qubole gracefully terminates the command only when:

  1. The connection to the cluster logs location fails.
  2. The running application is stopped.

End of Life for Hadoop 1 and Hadoop 2.8

  • HADTWO-2375: Qubole has completely stopped supporting Hadoop 2.8 by removing it from the cluster AMI.
  • HADTWO-2384: Qubole has completely stopped supporting Hadoop 1 clusters by removing it from the cluster AMI.

Enhancements

  • HADTWO-2301: Qubole has changed the log severity level for Host not found in weights map from ERROR to WARN. as the issue is not fatal. It occurs when a node is not available in the node info list. In such cases, Qubole uses a default node with its weight equal to 1. The issue can occur when nodes are manually added or terminated from a cluster.
  • HADTWO-2352: Qubole has backported YARN-3304. It resolves the issue where CPU usage metrics displayed inconsistent default value is fixed.
  • HADTWO-2353: ApplicationMaster/NodeManager/Container LivelinessMonitor now uses the monotonic time to calculate the time period. The related open-source jira is YARN-4403.
  • HADTWO-2501: Qubole now supports AUX services to have a custom classpath/jar file. The related open-source jira is YARN-4577.

Bug Fixes

  • HADTWO-2365: Qubole marks a Hadoop worker node as unhealthy if its root disk gets full. This unhealthy node is eventually removed from the cluster.
  • HADTWO-2458: The issue in Hadoop 3 that caused slowness in container acquisition is fixed now. The related open-source jira is YARN-8326.
  • HADTWO-2490: The issue where Spark/Shell commands were failing with the following error when cluster was being downscaled is fixed: Unable to close file because the last block does not have enough number of replicas.
  • HADTWO-2506: The issue where a race condition which caused data to be written on decommissioned HDFS Datanodes is fixed now.

For a list of bug fixes between versions R58 and R59, see Changelog for api.qubole.com.