Jump to content

Extended Boolean model

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by 198.82.29.107 (talk) at 20:36, 2 December 2009. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The Extended Boolean Model was introduced in 1983 by Gerard Salton, Edward A. Fox, Harry Wu. The goal of the Extended Boolean Model is to overcome the drawbacks of Boolean model used in information retrieval. The Boolean model doesn't consider term weight while doing query, and the result set of a boolean query is often either too small or too big. The idea of the extended model is to make use of partial matching and term weight as in the vector space model. It combines the characteristics of the Vector Space Model with the properties of Boolean algebra and ranks the similarity between queries and documents.[1]

Definitions

In the Extended Boolean model, a document is represented as a vector (similarly to in the vector model). Each dimension corresponds to a separate term weight associated with the document.

The weight of term Kx associated with document dj can be defined as:

where Idfx is inverse document frequency.

The weight vector associated with document dj can be represented as:

The 2 Dimensions Example

Figure 1
Figure 1: The similarities of q = (KxKy) and document dj, dj+1.
Figure 2
Figure 2: The similarities of q = (KxKy) and document dj, dj+1.

Considering the space composed of two terms Kx and Ky only, the corresponding term weights are w1 and w2[2]. Thus, for query qor = (KxKy), we can calculate the similarity with the following formula:

For query qand = (KxKy), we can use:

Generalizing the idea and P-norms

We can generalize the previous 2D extended boolean model example to higher t-dimension of Euclidean distances.

This can be done using P-norms which extend the notion of distance to include p-distances, where 1 ≤ p ≤ ∞ is a new parameter.

  • A generalized conjunctive query is given by:
  • The similarity of and can be define:
  • A generalized disjunctive query is given by:
  • The similarity of and can be define:

Examples

Consider the query q = (K1K2) ∨ K3, The similarity between query q and document d can be the following formula:

Further reading

See also

References