For in-depth information on various Big Data technologies, check out my free e-book “Introduction to Big Data“.
MapReduce is a good tool for offline, ad-hoc analytics, which often involves multiple successive jobs. A single MapReduce job essentially performs a group-by aggregation in a massively parallel way. However, its programming model is very low level. Custom code has to be written for even simple operations like projection and filtering. It is even more tedious and verbose to implement common relational operators such as join. Several efforts have been devoted to simplify the development of MapReduce programs by providing high level DSLs that can be translated to native MapReduce code. Different from many other projects that bring SQL to Hadoop, Pig is special in that it provides a procedural (data flow) programming language Pig Latin as it was designed for experienced programmers. Continue reading