Linear belief function

Linear Belief Function is an extension of the Dempster-Shafer theory of belief functions to the case when variables of interest are continuous. Examples of such variables include financial asset prices, portfolio performance, and other antecedent and consequent variables. The theory was originally proposed by Arthur P. Dempster^[1] in the context of Kalman Filters and later was reelaborated, refined, and applied by Liping Liu.

Concept

A linear belief function intends to represent our belief regarding the location of the true value as follows: We are certain that the truth is on a so-called certainty hyperplane but we do not know its exact location; along some dimensions of the certainty hyperplane, we believe the true value could be anywhere from –∞ to +∞ and the probability of being at a particular location is described by a normal distribution; along other dimensions, our knowledge is vacuous, i.e., the true value is somewhere from –∞ to +∞ but the associated probability is unknown. As we know, a belief function in general is defined by a mass function over a class of focal elements, which may have nonempty intersections. A linear belief function is a special type of belief functions in the sense that its focal elements are exclusive, parallel sub-hyperplanes over the certainty hyperplane and its mass function is a normal distribution across the sub-hyperplanes.

Based on the above geometrical description, Shafer^[2] and Liu^[3] propose two mathematical representations of a LBF: a wide-sense inner product and a linear functional in the variable space, and as their duals over a hyperplane in the sample space. Monney ^[4] proposes a still another structure called Gaussian hints. Although these representations are mathemati-cally neat, they tend to be unsuitable for knowledge represen-tation in expert systems.

Knowledge Representation

A linear belief function can represent both logical and probabilistic knowledge for three types of variables: deterministic such as an observable or controllable, random whose distribution is normal, and vacuous on which no knowledge bears. Logical knowledge is represented by linear equations, or geometrically, a certainty hyperplane. Probabilistic knowledge is represented by a normal distribution across all parallel focal elements.

In general, assume X is a vector of multiple normal variables with mean μ and covariance Σ. Then, the multivariate normal distribution can be equivalently represented as a moment matrix:

$M(X)=\left({\begin{array}{*{20}c}\mu \\\Sigma \\\end{array}}\right)$ .

If the distribution is non-degenerate, i.e., Σ has a full rank and its inverse exists, the moment matrix can be fully swept:

$M({\vec {X}})=\left({\begin{array}{*{20}c}{\mu \Sigma ^{-1}}\\{-\Sigma ^{-1}}\\\end{array}}\right)$

Except for normalization constant, the above equation completely determines the normal density function for X. Therefore, $M({\vec {X}})$ represents the probability distribution of X in the potential form.

These two simple matrices allow us to represent three special cases of linear belief functions. First, for an ordinary normal probability distribution M(X) represents it. Second, suppose one makes a direct observation on X and obtains a value μ. In this case, since there is no uncertainty, both variance and covariance vanish, i.e., Σ = 0. Thus, a direct observation can be represented as:

$M(X)=\left({\begin{array}{*{20}c}\mu \\0\\\end{array}}\right)$

Third, suppose one is completely ignorant about X. This is a very thorny case in Bayesian statistics since the density function does not exist. By using the fully swept moment matrix, we rep-resent the vacuous linear belief functions as a zero matrix in the swept form follows:

$M({\vec {X}})=\left[{\begin{array}{*{20}c}0\\0\\\end{array}}\right]$

One way to understand the representation is to imagine complete ignorance as the limiting case when the variance of X approaches to ∞, where one can show that Σ^-1 = 0 and hence $M({\vec {X}})$ vanishes. However, the above equation is not the same as an improper prior or normal distribution with infinite variance. In fact, it does not correspond to any unique probability distribution. For this reason, a better way is to understand the vacuous linear belief functions as the neutral element for combination (see later).

To represent the remaining three special cases, we need the concept of partial sweeping. Unlike a full sweeping, a partial sweeping is a transformation on a subset of variables. Suppose X and Y are two vectors of normal variables with the joint moment matrix:

$M(X,Y)=\left[{\begin{array}{*{20}c}{\begin{array}{*{20}c}{\mu _{1}}\\{\Sigma _{11}}\\{\Sigma _{21}}\\\end{array}}&{\begin{array}{*{20}c}{\mu _{2}}\\{\Sigma _{12}}\\{\Sigma _{22}}\\\end{array}}\\\end{array}}\right]$

Then M(X, Y) may be partially swept. For example, we can define the partial sweeping on X as follows:

$M({\vec {X}},Y)=\left[{\begin{array}{*{20}c}{\begin{array}{*{20}c}{\mu _{1}(\Sigma _{11})^{-1}}\\{-(\Sigma _{11})^{-1}}\\{\Sigma _{21}(\Sigma _{11})^{-1}}\\\end{array}}&{\begin{array}{*{20}c}{\mu _{2}-\mu _{1}(\Sigma _{11})^{-1}\Sigma _{12}}\\{(\Sigma _{11})^{-1}\Sigma _{12}}\\{\Sigma _{22}-\Sigma _{21}(\Sigma _{11})^{-1}\Sigma _{12}}\\\end{array}}\\\end{array}}\right]$

If X is one-dimensional, a partial sweeping replaces the variance of X by its negative inverse and multiplies the inverse with other elements. If X is multidimensional, the operation involves the inverse of the covariance matrix of X and other multiplications. A swept matrix obtained from a partial sweeping on a subset of variables can be equivalently obtained by a sequence of partial sweepings on each individual variable in the subset and the order of the sequence does not matter. Similarly, a fully swept matrix is the result of partial sweepings on all variables.

We can make two observations. First, after the partial sweeping on X, the mean vector and covariance matrix of X are respectively $\mu _{1}(\Sigma _{11})^{-1}$ and $-(\Sigma _{11})^{-1}$ , which are the same as that of a full sweeping of the marginal moment matrix of X. Thus, the elements corresponding to X in the above partial sweeping equation represent the marginal distribution of X in potential form. Second, according to statistics, $\mu _{2}-\mu _{1}(\Sigma _{11})^{-1}\Sigma _{12}$ is the conditional mean of Y given X = 0; $\Sigma _{22}-\Sigma _{21}(\Sigma _{11})^{-1}\Sigma _{12}$ is the conditional covariance matrix of Y given X = 0; and $(\Sigma _{11})^{-1}\Sigma _{12}$ is the slope of the regression model of Y on X. Therefore, the elements corresponding to Y indices and the intersection of X and Y in $M({\vec {X}},Y)$ represents the conditional distribution of Y given X = 0.

We may use an audit problem to illustrate the three types of variables as follows. Suppose we want to audit the ending balance of accounts receivable (E). As we saw earlier, E is equal to the beginning balance (B) plus the sales (S) for the period minus the cash receipts (C) on the sales plus a residual (R) that represents insignificant sales returns and cash discounts. Thus, we can represent the logical relation as a linear equation:

$E=B+S-C+R$

Furthermore, if the auditor believes E and B are 100 thousand dollars on the average with a standard deviation 5 and the covariance 15, we can represent the belief as a multivariate normal distribution. If historical data indicate that the residual R is zero on the average with a standard deviation of 0.5 thousand dollars, we can summarize the historical data by normal distribution R ~ N(0, 0.5²). If there is a direct observation on cash receipts, we can represent the evidence as an equation say, C = 50 (thousand dollars). If the auditor knows nothing about the beginning bal-ance of accounts receivable, we can represent his or her ignorance by a vacuous LBF. Finally, if historical data suggests that, given cash receipts C, the sales S is on the average 8C + 4 and has a standard deviation 4 thousand dollars, we can represent the knowledge as a linear regression model S ~ N(4 + 8C, 16).

References

^ A. P. Dempster, "Normal belief functions and the Kalman filter," in Data Analysis from Statistical Foundations, A. K. M. E. Saleh, Ed.: Nova Science Publishers, 2001, pp. 65-84.
^ G. Shafer, "A note on Demster's Gaussian belief functions," School of Business, University of Kansas, Lawrence, KS, Technical Report 1992.
^ L. Liu, "A theory of Gaussian belief functions," International Journal of Approximate Reasoning, vol. 14, pp. 95-126, 1996
^ P. A. Monney, A Mathematical Theory of Arguments for Statistical Evidence. New York, NY: Springer, 2003.

[1] A. P. Dempster, "Normal belief functions and the Kalman filter," in Data Analysis from Statistical Foundations, A. K. M. E. Saleh, Ed.: Nova Science Publishers, 2001, pp. 65-84.

[2] G. Shafer, "A note on Demster's Gaussian belief functions," School of Business, University of Kansas, Lawrence, KS, Technical Report 1992.

[3] L. Liu, "A theory of Gaussian belief functions," International Journal of Approximate Reasoning, vol. 14, pp. 95-126, 1996

[4] P. A. Monney, A Mathematical Theory of Arguments for Statistical Evidence. New York, NY: Springer, 2003.

[1]

[2]

[3]

[4]