Web Science/Part2: Emerging Web Properties/Advanced statistical descriptive models for the Web/The Zipf law for text

From Wikiversity
Jump to: navigation, search

The Zipf law for text

Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Learning-goals.svg

Learning goals

  1. Be able to name some fundamental properties about how frequencies of words in texts are distributed
  2. Be a little bit more cautious about visual impressions when looking at log-log plots
  3. Know both formulations of Zipf’s law
Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Video.svg

Video

Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Script.svg

Script

Find the slide deck at File:Questioning_the_Zipf_law.pdf

Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Quiz.svg

Quiz

What do you know about Zipf law?

Plotting the rank of words against the frequency appear as a straight line
the word rank multiplied by its frequency is supposed to be roughly constant
on the simple english wikipedia dataset the law only seams to hold for the top ranked words
Zipf's law has been falsified for many years and is only taught for historical reasons

Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Further-readings.svg

Further reading

  1. tba
Wikiversity-Mooc-Icon-Discussion.svg

Discussion