Using Pig with HDInsight
Apache Pig is a platform to analyze large data sets using a procedural language known as Pig Latin. One of the challenges with MapReduce is that to represent complex processing, you have to create multiple MapReduce operations and then chain them together to achieve the desired result, which is not easy or maintainable when requirements change very often. Instead, you can use Pig, which represents transformations as a data flow. You can write different transformations, one after another, to achieve the desired result. Apache Pig is mainly used in data manipulation operations, because it is easier to write in Pig Latin than to write basic MapReduce jobs in Java. Pig Latin is the language used by Pig to write procedures to do transformations. Pig Latin procedures usually consist of one or more operations, such as loading data from a file system, manipulating it, and storing the output for processing or dumping it on a screen.