Web Science/Part2: Emerging Web Properties/Modelling Similarity of Text/Comparing Results of Similarity Merasures

From Wikiversity
Jump to navigation Jump to search

Comparing Results of Similarity Merasures

Learning goals

  1. Understand that different modeling choices can produce very different results.
  2. Have a feeling how you could statistically compare the differences of the models.
  3. Know how you could extract keywords from documents with the tf-idf approach.
  4. Try to argue which model you like best in a certain scenario.

Video

Script

Quiz

1 which method can be used best to find characteristic words of a text?

jaccard
TF-IDF
TF
Language Model
Smoothed Language Model

2 Which method works well in an information retrieval setting

jaccard
TF-IDF
Language Model
Smoothed Language Model

3 Which method should be used when you don't have several occurences of the same elements?

jaccard
TF-IDF
Language Model
Smoothed Language Model


Further reading

  1. tba

Discussion