Big Data/Pig

Apache Pig provides a high-level declarative query language for Hadoop MapReduce.

Pig provides the query language Pig Latin. A Pig Latin script specifies a sequence of steps. Each steps defined only a single, high-level data transformation. When executing this script, it is first transformed into a logical plan that describes its execution. This plan is used to compile several MapReduce jobs that are executed on the Hadoop cluster.

Additional features:

user defined functions as first-class citizens
arbitrary input and output file formats
nested data model

Main operations:

LOAD
FOREACH
FILTER
COGROUP
GROUP
JOIN
UNION
CROSS
ORDER
DISTINCT
STORE

References

Apache Pig - official web site
Wikipedia Article - Apache Pig
C. Olston and B. Reed and U. Srivastava and R. Kumar and A. Tomkins "Pig Latin: A Not-so-foreign Language for Data Processing" Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08), 2008, Pages 1099-1110, ACM New York, NY, USA

Technologies

References