Pig in Qubole¶
The Qubole Pig distribution is derived from the Apache Pig versions 0.11, 0.15 and 0.17 (beta). It currently runs on AWS only.
Pig is a platform used to analyze large data sets that contains high-level language to express data analysis programs. Pig’s infrastructure layer contains a compiler that generates MapReduce programs’ sequences, for which large-scale parallel implementations are already existing. Pig’s language layer contains a textual language called Pig Latin that has the following important properties:
- Ease of programming - Complex tasks containing related data transformations are explicitly encoded as data flow sequences that make them easy to write, understand, and maintain.
- Optimization Opportunities - The mechanism in which tasks are encoded lets the system in optimizing the execution automatically and lets you to focus on semantics rather than efficiency.
- Extensibility - It allows creating own functions to do special-purpose processing.
Qubole supports Pig versions, 0.11 (Pig11), 0.15 (Pig15), and 0.17 (beta) (Pig17) on Hadoop 2 clusters. Pig 0.11 is the default version on the Hadoop 2 cluster. Pig 0.15 is supported in shell commands on Hadoop 2 clusters. You can also choose between MapReduce and Tez as the execution engine when you set the Pig 0.17 (beta) version.
Qubole supports HCatalog and Pig integration. However, only Pig11 and later versions support HCatalog integration. See Pig HCatalog Integration for more information.