Web Science/Part2: Emerging Web Properties/Advanced statistical descriptive models for the Web/The Zipf law for text

From Wikiversity
Jump to navigation Jump to search

The Zipf law for text

Learning goals

  1. Be able to name some fundamental properties about how frequencies of words in texts are distributed
  2. Be a little bit more cautious about visual impressions when looking at log-log plots
  3. Know both formulations of Zipf’s law

Video

Script

Find the slide deck at File:Questioning_the_Zipf_law.pdf

Quiz

What do you know about Zipf law?

Plotting the rank of words against the frequency appear as a straight line
the word rank multiplied by its frequency is supposed to be roughly constant
on the simple english wikipedia dataset the law only seams to hold for the top ranked words
Zipf's law has been falsified for many years and is only taught for historical reasons


Further reading

  1. tba

Discussion