Web Science/Part2: Emerging Web Properties/Modelling Similarity of Text

From Wikiversity
Jump to: navigation, search

Modelling Similarity of Text

Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Learning-goals.svg

Learning goals

Wikiversity-Mooc-Icon-Associated-units.svg

Associated units

  • jump to video
  • download the video
  • jump to script
  • jump to quiz
  1. Know the properties of a similarity measure
  2. Be able to relate similarity and distance measures
  3. Know of two applications for modelling similarity
  • jump to video
  • download the video
  • jump to script
  • jump to quiz
  • Understand how text documents can be modeled as sets
  • Know the Jaccard coefficient as a similarity measure on sets
  • Know a trick how to remember the formula
  • Be aware of the possible outcomes of the Jaccard index
  • As always be able to criticize your model
  • jump to video
  • download the video
  • jump to script
  • jump to quiz
  • Be familiar with the the vector space model for text documents
  • Be aware of term frequency and (inverse) document frequency
  • Have reviewed the definitions of base and dimension
  • Realize that the angle between two vectors can be seen as a similarity measure
  • jump to video
  • download the video
  • jump to script
  • jump to quiz
  • Be aware of a unigram Language Model
  • Know Laplacian (aka +1) smoothing
  • Know the query likelihood model
  • The Kullback Leibler Divergence
  • See how a similarity measure can be derived from Kullback Leibler Divergence
  • jump to video
  • download the video
  • jump to script
  • jump to quiz
  • Understand that different modeling choices can produce very different results.
  • Have a feeling how you could statistically compare the differences of the models.
  • Know how you could extract keywords from documents with the tf-idf approach.
  • Try to argue which model you like best in a certain scenario.
Wikiversity-Mooc-Icon-Edit.svg
Wikiversity-Mooc-Icon-Ask.svg
Wikiversity-Mooc-Icon-Further-readings.svg

Further reading

no further reading defined
You can define further reading here.
In general you can use the edit button in the upper right corner of a section to edit its content.
Wikiversity-Mooc-Icon-Discussion.svg

Discussion