Hive Tuning

OutOfMemory Issues

OutOfMemory issues are sometimes caused by there being too many files in split computation. To resolve this problem, increase the Application Master (AM) memory. To increase the AM memory, set the following parameters:

set tez.am.resource.memory.mb=<Size in MB>;
set tez.am.launch.cmd-opts=-Xmx<Size in MB>;
The default value for tez.am.resource.memory.mb is 1536MB.

Block & Split Tuning

HDFS block size manages the storage of the data in the cluster and the split size drives how that data is read for processing by MapReduce. Make sure the block sizing and the Mapper maximum and minimum split size are not causing the creation of an unnecessarily large number of files.

dfs.blocksize       Sets the HDFS Block Size for storage - defaults to 128 MB
mapred.min.split.size       Sets the minimum split size - defaults to dfs.blocksize
mapred.max.split.size       Sets the maximum split size - defaults to dfs.blocksize

Configuring the split size boundaries for MapReduce may have cascading effects on the number of mappers created and the number of files each Mapper will access.

Blocks Required     Dataset Size / dfs.blocksize
Maximum Mappers Required    Dataset Size / mapred.min.split.size
Minimum Mappers Required    Dataset Size / mapred.max.split.size
Maximum Mappers per Block   Maximum Mappers Required / Blocks Required
Maximum Blocks per Mapper   Blocks Required / Minimum Mappers Required

Parallelism Tuning

The number of tasks configured for worker nodes determines the parallelism of the cluster for processing Mappers and Reducers. As the slots get used by MapReduce jobs, there may job delays due to constrained resources if the number of slots was not appropriately configured. Try to set maximums and not constants so as to put boundaries on Hive but not handcuff it to a certain number of tasks.

mapred.tasktracker.map.tasks.maximum        Maximum number of map tasks
mapred.tasktracker.reduce.tasks.maximum     Maximum number of reduce tasks

Memory Tuning

If analysis of the tasks reveals that the memory utilization is low, consider modifying the memory allocation for the Hadoop cluster. Reducing the allocated memory for the tasks will free up space on the cluster and allow for an increased in the number of Mappers or Reducers.

mapred.map.child.java.opts  Java heap memory setting for the map tasks
mapred.reduce.child.java.opts       Java heap memory setting for the reduce tasks