Extended Boolean model
Extended Boolean Model was introduced in 1983 by Salton, Fox, and Wu. The goal of Extended Boolean Model is to overcome the drawback of Boolean model used in information retrieval. Boolean model doesn't consider term weight while doing query, and the results of boolean query is often either too small or too big. The notion is to make use of partial matching and term weight as vector space model. It combines the characteristics of the Vector Space Model withe the properties of Boolean algebra. and rank the similarity between query and documents.[1]
Definitions
A Extended Boolean model document is represent as a vector(similar as vector model). Each dimension corresponds to a separate term weight associated with document. The term x weight associated with document j can be define as:
,where is inverse document frequency.
The weight vector associated with document j can be represent as:
The 2 Dimensions Example
Figure 1: The similarities of and document Dj, Dj+1.
Considering the space composed of two terms Kx and ky only, the correspond term weights is w1,w2[2]:
For query q=(Kx or Ky), Thus, we can use:
to calculate the similarity.
Figure 2: The similarities of and document Dj, Dj+1.
Considering the space composed of two terms Kx and ky only, the correspond term weights is w1,w2: For query q=(Kx and Ky), Thus, we can use:
Generalizing the idea and P-norms
We can generalize the previous 2D extend boolean model example to higher t-dimension of Euclidean distances.
This can be done using P-norms which extend the notion of distance to include p-distances, where is a new parameter.
- A generalized conjunctive query is given by:
- The similarity of and can be define:
- A generalized disjunctive query is given by:
- The similarity of and can be define:
Examples
consider the query , The similarity between query and document d can be the following formula:
Further reading
- Adaptive Feedback Methods in an Extended Boolean Model by Dr.Jongpill Choi
- Interpolation of the extended Boolean retrieval model