Web Science/Part2: Emerging Web Properties/Simple statistical descriptive Models for the Web/Counting Words And Documents

From Wikiversity
Jump to: navigation, search

Counting Words And Documents

Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Learning-goals.svg

Learning goals

  1. Understand why we selected simple English Wikipedia as a toy example for modeling the web
  2. Understand that a task already as simple as counting words includes modeling choices
  3. Be familiar with the term “unique word token”
  4. Know some basic tools to count words and documents
Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Video.svg

Video

Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Script.svg

Script

Find the slide deck at File:Counting_Words_And_Documents.pdf

Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Quiz.svg

Quiz

1 How many words are in the following sentence? It is really really difficult to count words.

6
7
8
9

2 How many unique word tokens are in the following sentence? It is really really difficult to count words.

6
7
8
9

3 which of the following command line tools can be used to count words?

cat
ls
wc
tr

Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Further-readings.svg

Further reading

  1. tba
Wikiversity-Mooc-Icon-Discussion.svg

Discussion