Least-squares estimation of linear regression coefficients

Given the hypothesis of linear regression and the Gauss-Markov hypothesis, we can find an explicit form for the function which lies the most clsely to the dependant variable $Y$ .

As $X_{1},\cdots ,X_{p}$ and $Y$ are random variables, we only have a concrete realization $x_{1},\cdots ,x_{p}$ and $y$ of them. Based on these numbers, we can only find an estimate ${\hat {f}}$ of $f$ .

Therefore, we want an estimate of $\beta =(\beta ^{0},\cdots ,\beta ^{p})$ . Under the Gauss-Markov assumptions, there exists an optimal solution. We can see the unknown function $f=\mathbb {E} (Y|(X_{1},\cdots ,X_{p}))$ as the projection of $Y$ on the subspace of $F$ generated by $(X_{1},\cdots ,X_{p})$ . $f=X\beta$ , where $X$ is the matrix whose columns are $(X_{1},\cdots ,X_{p})$ .

If we define the scalar product $\langle \bullet ,\bullet \rangle$ by $\langle u,v\rangle :=u^{t}v$ and write $\|\bullet \|$ for the induced norm, the metric d can be written $d(f,g)=\mathbb {E} [\|f-g\|^{2}]$ . Minimizing this norm is equivalent to projecting orthogonally $Y$ on the subspace induced by $(X_{1},\cdots ,X_{p})$ with the projection $p$ .

The projection being orthogonal, $p(Y)-Y$ is orthogonal to the subspace generated by $X_{1},\cdots ,X_{p}$ . Therefore, $X^{t}(p(Y)-Y)=0$ . As $p(Y)=X^{t}Y$ , this equation yields to $X^{t}X\beta =X^{t}Y$ .

If $X$ is of full rank,, then so is $X^{t}X$ . In that case,

$\beta =(X^{t}X)^{-1}X^{t}Y$ . Given the realizations $x$ and $y$ of $X$ and $Y$ , we choose ${\hat {\beta }}=(x^{t}x)^{-1}x^{t}y$ and ${\hat {f}}=X{\hat {\beta }}$ .