Web Science/Part2: Emerging Web Properties/Simple statistical descriptive Models for the Web/Counting Words And Documents

From Wikiversity
Jump to navigation Jump to search

Counting Words And Documents

Learning goals

  1. Understand why we selected simple English Wikipedia as a toy example for modeling the web
  2. Understand that a task already as simple as counting words includes modeling choices
  3. Be familiar with the term “unique word token”
  4. Know some basic tools to count words and documents

Video

Script

Find the slide deck at File:Counting_Words_And_Documents.pdf

Quiz

1 How many words are in the following sentence? It is really really difficult to count words.

6
7
8
9

2 How many unique word tokens are in the following sentence? It is really really difficult to count words.

6
7
8
9

3 which of the following command line tools can be used to count words?

cat
ls
wc
tr


Further reading

  1. tba

Discussion