Web Science/Part2: Emerging Web Properties/Modelling Similarity of Text/Jaccard Similarity for Sets

From Wikiversity
Jump to: navigation, search

Jaccard Similarity for Sets

Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Learning-goals.svg

Learning goals

  1. Understand how text documents can be modeled as sets
  2. Know the Jaccard coefficient as a similarity measure on sets
  3. Know a trick how to remember the formula
  4. Be aware of the possible outcomes of the Jaccard index
  5. As always be able to criticize your model
Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Video.svg

Video

Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Script.svg

Script

The slides can be found at File:Jaccard-Similarity-for-Sets.pdf

Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Quiz.svg

Quiz

1

given D1 = a a a b and D2 = b b b a what is the jaccard coefficient of the corresponding word sets?

1/1
2/4
2/8
2/6

2

given D1 = a b c d e and D2 = e f g h what is the jaccard coefficient of the corresponding word sets?

1/7
1/8
1/9
2/8


Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Further-readings.svg

Further reading

  1. tba
Wikiversity-Mooc-Icon-Discussion.svg

Discussion