Jump to content

Information retrieval/Mathematical Foundations

From Wikiversity

Introduction

[edit | edit source]

The mathematical foundation should support the learner in representing search request and matching results in mathematical terms and use e.g. mathematical set theory, graphs, probability theory, fuzzy theory, ...

Learning Activities

[edit | edit source]
  • Assume a user wants to repair a bicycle and enter search request includes the words "repair bicycle". How can you represent the search results as intersection of two sets.
  • Identify words that have multiple meanings (see Disambiguation) and explain how probability theory is involved for search request.

Mathematical Basis of Information Retrieval

[edit | edit source]

Model types

[edit | edit source]
Categorization of IR-models (translated from German entry, original source Dominik Kuropka)

In order to effectively retrieve relevant documents by IR strategies, the documents are typically transformed into a suitable representation. Each retrieval strategy incorporates a specific model for its document representation purposes. The picture on the right illustrates the relationship of some common models. In the picture, the models are categorized according to two dimensions: the mathematical basis and the properties of the model.

Properties of the model

[edit | edit source]
  • Models without term-interdependencies treat different terms/words as independent. This fact is usually represented in vector space models by the orthogonality assumption of term vectors or in probabilistic models by an independency assumption for term variables.
  • Models with immanent term interdependencies allow a representation of interdependencies between terms. However the degree of the interdependency between two terms is defined by the model itself. It is usually directly or indirectly derived (e.g. by dimensional reduction) from the co-occurrence of those terms in the whole set of documents.
  • Models with transcendent term interdependencies allow a representation of interdependencies between terms, but they do not allege how the interdependency between two terms is defined. They rely on an external source for the degree of interdependency between two terms. (For example, a human or sophisticated algorithms.)

Mathematical Basis of Information Retrieval

[edit | edit source]

Performance and correctness measures

[edit | edit source]

The evaluation of an information retrieval system' is the process of assessing how well a system meets the information needs of its users. In general, measurement considers a collection of documents to be searched and a search query. Traditional evaluation metrics, designed for Boolean retrieval[say what?] or top-k retrieval, include precision and recall. All measures assume a ground truth notion of relevance: every document is known to be either relevant or non-relevant to a particular query. In practice, queries may be ill-posed and there may be different shades of relevance.

See also

[edit | edit source]

References

[edit | edit source]
  1. Ogawa, Y., Morita, T., & Kobayashi, K. (1991). A fuzzy document retrieval system using the keyword connection matrix and a learning method. Fuzzy sets and systems, 39(2), 163-179.

Page Information

[edit | edit source]

This page was based on the following wikipedia-source page: