Reserving Space for ApplicationMaster on On-Demand Nodes

Qubole supports reserving buffer space in On-Demand nodes to run ApplicationMaster (AM) containers. As the buffer space is reserved for AMs, no task containers can get allocated in this reserved space. This enhancement is supported in Hadoop (Hive) and Spark clusters.

In case an AM is scheduled on a spot node and as the lifetime of AMs are more than that of task containers, a spot loss can result in a job failure or multiple retry of AMs in some cases. You can avoid this by reserving buffer space that helps in scheduling AMs on On-Demand nodes thus providing more stability for running jobs.

Note

This enhancement does not hold good to all spot clusters or clusters when its worker nodes’ percentage is 100% spot.

Configuring Space Reservation for ApplicationMaster

To enable space reservation for AMs on On-Demand nodes, pass yarn.scheduler.am-reservation.enabled=true through Override Hadoop Configuration Variables under Hadoop Cluster Settings in the cluster UI’s Advanced Configuration. After you enable it, space for 4 AMs are reserved on the cluster by default.

The following configuration properties determine the default size of an AM:

  • yarn.app.am.default-size.mb: It is equal to maximum of (yarn.app.mapreduce.am.resource.mb, tez.am.resource.memory.mb, spark.yarn.am.memory, spark.driver.memory, yarn.app.am.default-size.mb) set at the cluster level.
  • yarn.app.am.default-size.vcores: Its value is 1 by default.

You can use the above two parameters to adjust the default AM size and space for 4 AMs are reserved accordingly.

You can also override the amount of memory and vcores to be reserved for the AMs in absolute terms by using these configuration properties:

  • yarn.scheduler.am-reservation.capacity.mb: It denotes the amount of memory in MB to reserve. Its default value is 4 * (default AM size).
  • yarn.scheduler.am-reservation.capacity.vcores: It denotes the number of vCPUs to reserve. Its default value is 4.

The cluster does not bring up On-Demand nodes by default to meet the space requirement for AMs if there is a lack of On-Demand space. In case, if there is insufficient space on On-Demand nodes to reserve space for AMs, you can configure the cluster to bring up an On-Demand node explicitly by passing yarn.scheduler.am-reservation.get_stable_node=true as a Hadoop override. Note that the spot percentage configuration of the cluster can go below the desired number in such cases.