Big Data/Pig

From Wikiversity
Jump to navigation Jump to search

Apache Pig provides a high-level declarative query language for Hadoop MapReduce.

Pig provides the query language Pig Latin. A Pig Latin script specifies a sequence of steps. Each steps defined only a single, high-level data transformation. When executing this script, it is first transformed into a logical plan that describes its execution. This plan is used to compile several MapReduce jobs that are executed on the Hadoop cluster.

Additional features:

  • user defined functions as first-class citizens
  • arbitrary input and output file formats
  • nested data model

Main operations:

  • LOAD
  • FOREACH
  • FILTER
  • COGROUP
  • GROUP
  • JOIN
  • UNION
  • CROSS
  • ORDER
  • DISTINCT
  • STORE


References[edit | edit source]