Hadoop 2

Bug Fixes

  • HADTWO-1177: A cluster node could not transfer data to other nodes due to network issues. So, tasks could not read data from HDFS and failed. The solution to the issue is that data blocks will not be written on these nodes.
  • HADTWO-1217: Explicit restart of services that resulted in command/job failures has been removed to avoid such failures.
  • HADTWO-1250: Corrected the misleading error message when the container gets released on an unhealthy or unreachable node.
  • HADTWO-1286: The issue in which Fair Scheduler could sometimes schedule containers on nodes which are in graceful decommission or are about to be lost has been resolved.
  • HADTWO-1306: Fixed an issue in which some Hive queries were failing because of a bug in the Snappy compression. Ported HADOOP-8151 from open source to fix the issue.
  • HADTWO-1307: The issue in which the Hadoop 2 jobs were failing with execution errors has been resolved. The issue occurred mainly because of the delay in the container launch on the Node Manager BetaImage4.
  • HADTWO-1357: DistCp when used with dynamic strategy does not update the chunkFilePath and other static variables any time other than for the first job. This is seen when DistCp::run() is used. A single copy succeeds but multiple jobs finish successfully without actually copying.
  • HADTWO-1455: To use the optimized list prefix call feature, set fs.s3a.list-prefix.version=2 as an Hadoop override.

Enhancements

  • HADTWO-1021: QDS now supports Multipart File Output Committer. Currently, it is only available for Hadoop jobs when the s3a file system is enabled.
  • HADTWO-1191: Currently, the ResourceManager (RM) marks a node as lost if it does not receive heartbeat for it in 10 minutes. For spot nodes, QDS knows the exact time of its loss. With this enhancement, RM expires such spot nodes immediately after they are gone instead of waiting for additional 10 minutes.
  • HADTWO-1225: Qubole has upgraded the AWS SDK for Java version used by the s3a file system from 1.11.160 to 1.11.241.
  • HADTWO-1235: Addition of a new EBS volume can sometimes be slow and fail to complete before the disk gets full, thus potentially causing write failures. The default thresholds for upscaling have been modified as follows to avoid this:
    • If the volume is 75% full (instead of 85% earlier), an additional EBS volume will be attached. Note that this will only apply to new clusters and existing clusters will continue to run with the value as currently configured.
    • If the volume is estimated to be full in 10 minutes (instead of 2 minutes earlier) an additional EBS volume will be attached. This will apply to existing clusters as well when they are restarted.