Using Pig with HDInsight

  • Vinit Yadav
Chapter

Abstract

Apache Pig is a platform to analyze large data sets using a procedural language known as Pig Latin. One of the challenges with MapReduce is that to represent complex processing, you have to create multiple MapReduce operations and then chain them together to achieve the desired result, which is not easy or maintainable when requirements change very often. Instead, you can use Pig, which represents transformations as a data flow. You can write different transformations, one after another, to achieve the desired result. Apache Pig is mainly used in data manipulation operations, because it is easier to write in Pig Latin than to write basic MapReduce jobs in Java. Pig Latin is the language used by Pig to write procedures to do transformations. Pig Latin procedures usually consist of one or more operations, such as loading data from a file system, manipulating it, and storing the output for processing or dumping it on a screen.

Keywords

Average Delay Execution Engine Departure Delay Logical Plan Procedural Language 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Copyright information

©  Vinit Yadav 2017

Authors and Affiliations

  • Vinit Yadav
    • 1
  1. 1.AhmedabadIndia

Personalised recommendations