Divergence-from-randomness model
![]() | This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
In the field of information retrieval, divergence-from-randomness is one type of probabilistic model. It is used to test the amount of information carried in the documents. The idea of the model is that 'informative' terms in a document are more statistically diverged from the randomness of a term distribution model than 'non-informative' terms. It is based on Harter's 2-Poisson indexing model.
Definition
The divergence-from-randomness model is based on the idea that the more the divergence is from its frequency, the more the information carried by the document.[1] When a term cannot be found in a document, then in that document, the term has approximately zero probability of being 'informative'.
- M represents the type of model of randomness which employs to calculate the probability.
- d is the total number of words in the documents.
- t is the number of a specific word in d.
- k is defined by M.
It is possible to use different urn models.
Probability space
Utility-Theoretic Indexing, developed by Cooper and Maron, is a theory of indexing based on utility theory. To reflect the 'value' for documents that are expected by the users, index terms are assigned to documents. The probability distribution assigns probabilities to all sets of terms for the vocabulary.
In information retrieval, the term experiment alludes to the notion that the document can be acted as if it is a sequence of outcomes or just a sample of the items. The number of 'trials', where each word occurrence is a 'trial', can be assumed to be independent of each other; the probability distribution over the vocabulary is the same for each word.
References
- ^ "Divergence From Randomness (DFR) Framework". Terrier Team, University of Glasgow.
General references
- Amati, Giambattista (2003). Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness. University of Glasgow.