Jump to content

Extended Boolean model

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Sonicat (talk | contribs) at 22:50, 1 December 2009 (Created page with ''''Extended Boolean Model''' was introduced in 1983 by Salton, Fox, and Wu. The goal of '''Extended Boolean Model''' is to overcome the drawback of Boolean model us...'). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Extended Boolean Model was introduced in 1983 by Salton, Fox, and Wu. The goal of Extended Boolean Model is to overcome the drawback of Boolean model used in information retrieval. Boolean model doesn't consider term weight while doing query, and the results of boolean query is often either too small or too big. The notion is to make use of partial matching and term weight as vector space model. It combines the characteristics of the Vector Space Model withe the properties of Boolean algebra. and rank the similarity between query and documents.[1]

Definitions

A Extended Boolean model document is represent as a vector(similar as vector model). Each dimension corresponds to a separate term weight associated with document. The term x weight associated with document j can be define as:

,where is inverse document frequency.


The weight vector associated with document j can be represent as:

The 2 Dimensions Example

Figure 1: The similarities of and document Dj, Dj+1.


Considering the space composed of two terms Kx and ky only, the correspond term weights is w1,w2[2]:

For query q=(Kx or Ky), Thus, we can use:

to calculate the similarity.

Figure 2: The similarities of and document Dj, Dj+1.

Considering the space composed of two terms Kx and ky only, the correspond term weights is w1,w2: For query q=(Kx and Ky), Thus, we can use:

Generalizing the idea and P-norms

We can generalize the previous 2D extend boolean model example to higher t-dimension of Euclidean distances.

This can be done using P-norms which extend the notion of distance to include p-distances, where is a new parameter.

  • A generalized conjunctive query is given by:

  • The similarity of and can be define:

  • A generalized disjunctive query is given by:

  • The similarity of and can be define:

Examples

consider the query , The similarity between query and document d can be the following formula:

Further reading

See also

References