Web Science/Part2: Emerging Web Properties/Modelling Similarity of Text/Comparing Results of Similarity Merasures

From Wikiversity
Jump to: navigation, search

Comparing Results of Similarity Merasures

Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Learning-goals.svg

Learning goals

  1. Understand that different modeling choices can produce very different results.
  2. Have a feeling how you could statistically compare the differences of the models.
  3. Know how you could extract keywords from documents with the tf-idf approach.
  4. Try to argue which model you like best in a certain scenario.
Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Video.svg

Video

Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Script.svg

Script

Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Quiz.svg

Quiz

1

which method can be used best to find characteristic words of a text?

jaccard
TF-IDF
TF
Language Model
Smoothed Language Model

2

Which method works well in an information retrieval setting

jaccard
TF-IDF
Language Model
Smoothed Language Model

3

Which method should be used when you don't have several occurences of the same elements?

jaccard
TF-IDF
Language Model
Smoothed Language Model


Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Further-readings.svg

Further reading

  1. tba
Wikiversity-Mooc-Icon-Discussion.svg

Discussion