Jump to content

Coefficient of multiple correlation

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by 199.168.95.52 (talk) at 00:12, 2 September 2014 (References). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In statistics, the coefficient of multiple correlation is a measure of how well a given variable can be predicted using a linear function of a set of other variables. It is measured by the square root of the coefficient of determination, but under the particular assumptions that an intercept is included and that the best possible linear predictors are used, whereas the coefficient of determination is defined for more general cases, including those of nonlinear prediction and those in which the predicted values have not been derived from a model-fitting procedure. The coefficient of multiple correlation takes values between zero and one; a higher value indicates a better predictability of the dependent variable from the independent variables, with a value of one indicating that the predictions are exactly correct and a value of zero indicating that no linear combination of the independent variables is a better predictor than is the fixed mean of the dependent variable.[1]

Definition

The coefficient of multiple correlation, denoted R, is a scalar that is defined as the Pearson correlation coefficient between the predicted and the actual values of the dependent variable in a linear regression model that includes an intercept.

Computation

The square of the coefficient of multiple correlation can be computed using the vector c of cross-correlations between the predictor variables (independent variables) and the target variable (dependent variable), and the correlation matrix Rxx of inter-correlations between predictor variables. It is given by

R2 = c' Rxx−1 c,

where c ' is the transpose of c, and Rxx−1 is inverse of the matrix Rxx.

If all the predictor variables are uncorrelated, the matrix Rxx is the identity matrix and R2 simply equals c' c, the sum of the squared cross-correlations with the dependent variable. If there is cross-correlation among the predictor variables, the inverse of the cross-correlation matrix accounts for this.

The squared coefficient of multiple correlation can also be computed as the fraction of variance of the dependent variable that is explained by the independent variables, which in turn is 1 minus the unexplained fraction. The unexplained fraction can be computed as the sum of squared residuals—that is, the sum of the squares of the prediction errors—divided by the sum of the squared deviations of the values of the dependent variable from its expected value.

Properties

With more than two variables being related to each other, the value of the coefficient of multiple correlation depends on the choice of dependent variable: a regression of y on x and z will in general have a different R than will a regression of z on x and y. For example, suppose that in a particular sample the variable z is uncorrelated with both x and y, while x and y are linearly related to each other. Then a regression of z on y and x will yield an R of zero, while a regression of y on x and z will yield a strictly positive R. This follows since the correlation of y with the best predictor based on x and z is in all cases at least as large as the correlation of y with the best predictor based on x alone, and in this case with z providing no explanatory power it will be exactly as large.

References

  • Allison, Paul D. (1998) Multiple Regression: A Primer' 'Sage Publications' 'ISBN-13: 9780761985334'
  • Cohen, Jacob, et al. (2002) Applied Multiple Regression: Correlation Analysis for the Behavioral Sciences ISBN 0805822232
  • Crown, William H. (1998) Statistical Models for the Social and Behavioral Sciences: Multiple Regression and Limited-Dependent Variable Models ISBN 0275953165
  • Edwards, Allen Louis (1985) Multiple Regression and the Analysis of Variance and Covariance ISBN 0716710811
  • Keith, Timothy (2006). Multiple Regression and Beyond, Boston, Mass: Pearson Education.
  • Fred N. Kerlinger, Elazar J. Pedhazur (1973) Multiple Regression in Behavioral Research.[full citation needed]
  • Stanton, Jeffrey M. (2001) "Galton, Pearson, and the Peas: A Brief History of Linear Regression for Statistics Instructors", Journal of Statistics Education, 9 (3)