Web Science/Part2: Emerging Web Properties/Modelling Similarity of Text/Probabilistic Similarity Measures Kullback Leibler Divergence

From Wikiversity
Jump to: navigation, search

Probabilistic Similarity Measures Kullback Leibler Divergence

Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Learning-goals.svg

Learning goals

  1. Be aware of a unigram Language Model
  2. Know Laplacian (aka +1) smoothing
  3. Know the query likelihood model
  4. The Kullback Leibler Divergence
  5. See how a similarity measure can be derived from Kullback Leibler Divergence
Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Video.svg

Video

Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Quiz.svg

Quiz

Smoothing is needed

to make sure the probability function will not take the value 0
because this will always yield more accurate results
because otherwise the query likelihood model would have sparse results for many queries
all of the above


Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Further-readings.svg

Further reading

  1. tba
Wikiversity-Mooc-Icon-Discussion.svg

Discussion