Web Science/Part2: Emerging Web Properties/Modelling Similarity of Text/Probabilistic Similarity Measures Kullback Leibler Divergence

From Wikiversity
Jump to navigation Jump to search

Probabilistic Similarity Measures Kullback Leibler Divergence

Learning goals

  1. Be aware of a unigram Language Model
  2. Know Laplacian (aka +1) smoothing
  3. Know the query likelihood model
  4. The Kullback Leibler Divergence
  5. See how a similarity measure can be derived from Kullback Leibler Divergence

Video

Quiz

Smoothing is needed

to make sure the probability function will not take the value 0
because this will always yield more accurate results
because otherwise the query likelihood model would have sparse results for many queries
all of the above


Further reading

  1. tba

Discussion