Network Performance Metrics¶
HADTWO-1625: For detecting network performance related issues on worker nodes, the following two metrics are sent to the Ganglia server:
ping.packet.loss: It is the percentage (%) packet loss while executing a ping command from a worker node to the master node.
ping.time: It is the RTT taken by a
pingcommand for sending 1000 packets to the master node.
HADTWO-728: Qubole skips Spot requests for instance families for which Spot node losses were seen at the cluster level within a specific time interval. The default time interval is last 15 minutes. If spot losses were seen for configured instance families, QDS tries to provision instances synchronously and finally it falls back to On-Demand (if configured) in case of unavailability of Spot instances. Via Support
Qubole recommends configuring instances of multiple families to maximize the Spot instances’ availability.
HADTWO-1162: The default values of the following HDFS options have been modified to speed up decommissioning of unused nodes:
dfs.namenode.replication.max-streams: Its default value is increased from 2 to 3.
dfs.namenode.replication.work.multiplier.per.iteration: Its default value is increased from 2 to 4.
dfs.namenode.decommission.interval: Its default value is reduced from 30 to 20.
dfs.namenode.decommission.nodes.per.interval: Its default value is increased from 5 to 20.
HADTWO-1745: QDS now supports running Hadoop2 clusters on Java8. Via Support
HADTWO-1903: For a pure spot node cluster, if the master node goes down due to a spot loss event, the entire cluster is terminated immediately.
- HADTWO-1797: When using custom DNS servers, applications can sometimes get stuck or killed due to timeout in case of DNS bottlenecks. This fix prevents that by removing the reverse DNS lookup.
- HADTWO-1780: The open-source change, HDFS-3384 is ported to resolve the DFSClient bug that threw
java.io.EOFException: Premature EOF: no length prefix available.