Web Science/Part2: Emerging Web Properties/Simple statistical descriptive Models for the Web/Number of words needed to understand most of Wikipedia

From Wikiversity
Jump to navigation Jump to search

Number of words needed to understand most of Wikipedia

Learning goals

  1. Understand what a log-log plot is
  2. Improve your skills in reading and interpreting diagrams
  3. Know about the word rank / frequency plot
  4. Should be able to transfer a histogram or curve into a cumulative distribution function

Video

Quiz

We saw that more than half of the unique word tokens on Simple English Wikipedia occured only once. Which of the following statements are true?

picking a random word from the simple english wikipedia the chance is higher than 50% that it occured only once
picking 100 random words from wikipedia we expect more than 50 of them to occure only once
picking a random word the chance for getting a word that occurs only once is less than 10%


Discussion