Talk:Multivariate normal distribution

Statistics C‑class Top‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics
C	This article has been rated as C-class on Wikipedia's content assessment scale.
Top	This article has been rated as Top-importance on the importance scale.

Conditional Distribution

I think where $x_{1}$ and $x_{2}$ come from in the conditional distribution section is a little confusing. I suggest the following update.

If $X$ ~ $N(\mu ,\Sigma )$ is partitioned such that

{\begin{bmatrix}x_{1}\\x_{2}\end{bmatrix}}\sim N\left({\begin{bmatrix}\mu _{1}\\\mu _{2}\end{bmatrix}},{\begin{bmatrix}\Sigma _{11}&\Sigma _{12}\\\Sigma _{21}&\Sigma _{22}\end{bmatrix}}\right)\quad

, where

\mu ={\begin{bmatrix}\mu _{1}\\\mu _{2}\end{bmatrix}}\quad

with sizes

{\begin{bmatrix}q\times 1\\(N-q)\times 1\end{bmatrix}}

\Sigma ={\begin{bmatrix}\Sigma _{11}&\Sigma _{12}\\\Sigma _{21}&\Sigma _{22}\end{bmatrix}}\quad

with sizes

{\begin{bmatrix}q\times q&q\times (N-q)\\(N-q)\times q&(N-q)\times (N-q)\end{bmatrix}}

,

then the distribution of $x_{1}$ conditional on $x_{2}=a$ is multivariate normal $(X_{1}|X_{2}=a)\sim N({\bar {\mu }},{\overline {\Sigma }})\quad$ where

{\bar {\mu }}=\mu _{1}+\Sigma _{12}\Sigma _{22}^{-1}\left(a-\mu _{2}\right)

and covariance matrix

{\overline {\Sigma }}=\Sigma _{11}-\Sigma _{12}\Sigma _{22}^{-1}\Sigma _{21}

. 192.91.171.42 (talk) 23:43, 21 August 2008 (UTC)[reply]

possible Kullback-Leibler Divergence error

I believe you have the KL divergence backwards. It should be from N1 to N0 based on your formula and the language on the KL Divergence page http://en.wikipedia.org/wiki/Kullback-Leibler_divergence. It is easy to check in the case where observations are independent, see Bozdogan 1987 for such a case. Troutinthemilk 20:05, 7 June 2007 (UTC)[reply]

Just another remark, wouldn't it be better to write Id instead of N for the identity matrix? If N means the identity matrix, otherwise would be a good idea to make clear what it stands for.—Preceding unsigned comment added by 129.69.61.54 (talk) 07:20, 16 June 2010 (UTC)[reply]

Hi. There is an error in the formula for the KL. This is related to the comment above also. The order of the means in the quadratic form should be reversed. It should read:

(m_0 - m_1)' S_1(^-1)(m_0 - m_1)

I will edit the formula if it is not corrected in few days, but since this page is (presumably) somebody else's baby, I thought I should leave a comment first. Thanks and great work, this page is very useful. Peter Halpin, PhD, University of Amsterdam —Preceding unsigned comment added by 145.18.152.249 (talk) 13:02, 12 October 2010 (UTC)[reply]

Peter, really, (a − b)² = (b − a)², so there is no error there... // stpasha » 21:09, 12 October 2010 (UTC)[reply]

error on the normalization

the normalization in the N-dimensional case should be:

(det(2πΓ))^-1/2

That is the same as (2π)^-N/2det(Γ)^-1/2. --Zero^talk 12:17, 24 January 2007 (UTC)[reply]

---

Note as of 2010/12/24 eqn shows: (2π)^-N/2det(Γ)^1/2 (due to det(Γ) already being in denominator, so neg power cancels out..) Fixing. SimonFunk (talk) 05:41, 25 December 2010 (UTC)[reply]

No, I put the minus sign on purpose. Move the whole determinant to the numerator, if you prefer. In December, I checked with Mathematica and noticed that the minus was missing. And if I am not completely confused, it is wrong now in the way that you left it. (I think when Zero talks about the normalisation factor (above), he means the denominator.) I am not undoing your undo now, as I am too lazy to double-check a second time that I am really right here, but if you haven't done so before removing my minus, please check again. Simon A. (talk) 23:03, 10 January 2011 (UTC)[reply]

Question on Bivariate Normal

If X is normally distributed, and Y is normally distributed. If z = X * Y, is z bivariate normally distributed?

Thanks

I moved the following condition from the main page:

there is a vector μ and a symmetric, positive definite matrix Γ such that the characteristic function of X is

φ_X(z)=exp(iμ^Tz-½u^TΓu).

Is u the same as μ here? AxelBoldt

The vector u is not the expected vector. The characteristic functional of X is the expectation value of exp[i(u₁X₁+...+u_nX_n)]. I write φ_X(u)=E[e^{iu X}]. The expected vector is the gradient of φ at u=0. I made a mistake: the correct characterization is

φ_X(u)=exp(iμ^Tu-½u^TΓu).

This characterization is necessary as a technical step in the proof of equivalence. As far as I know, it is the only way to show that, it every linear combination of the X_i is Gaussian, then the X_i are jointly Gaussian. -- Miguel

ok, I'll put it back in. AxelBoldt

The motivation of this page is that it is a prerequisite to defining a "Gaussian stochastic process". The best way to do this is to say that every linear functional the random function is a gaussian random variable. -- Miguel

The "characterization"

the X_i are independent normally distributed random variables

is actually not correct because it implies that the various X_i are uncorrelated.

yup, you're right; I'll mention that in the article. AxelBoldt

Also, it seems that Gaussian is capitalized because it is the name of a person. -- Miguel

There is also a program used in computational chemistry whose name is Gaussian (with the G) -- people used to this program may expect lower-case g for other uses of "gaussian". And according to the ACS Style Guide, the trend is toward lower-casing surnames that are used as units. But then again, this is math, not chemistry... -- Marj Tiefert, Monday, May 6, 2002

Oh, didn't know that "Gaussian" is based on a surname -- that't probably why I was not able to find many examples of the term lowercased on Google. A redirect with the term in lowercase already points here, so I think that is enough. --maveric149

I agree that units should be lower-case. For example, the "gauss" is a unit of magnetic field strength, in honour of Gauss' work on magnetism.

Now that we're at it and maveric is in this discussion, maybe we should agree on the best way to disambiguate gauss (unit) from Gauss (prince of mathematicians), and gaussian (computer program) from Gaussian (random variable) -- Miguel

Hum... Interesting quandary. First lets start with the personality since that is the easiest. It really isn't necessary to list Gauss the man on a disambiguation page located at Gauss since nobody with half a brain would simply link to Gauss and expect that link to go directly to an article about Carl Friedrich Gauss (which is named correctly BTW). The same would be true about Smith and Adam Smith -- it is a misuse of disambiguation pages to list people who had X for a last name unless they were primarily only known by their last name and other things are also known by that name (wikipedia is not a name directory). A good example of this would be Seneca which is both the name of a first century philosopher and the name of a Native American tribe (some disambiguation is needed at Seneca I see...).

However, Gauss is already redirected to Carl Friedrich Gauss, which is not surprising since mathematicians (and, more generally, scientists) are almost universally known by a single surname. This leads to confusion: there were five Bernouillis, two Banachs, two Pearsons... To confuse matters more, there is not only "gauss (unit)", which is a unit of magnetic field intensity, but also "Gauss units", which is a particular choice of normalization of the Maxwell equations and the elementary charge (there are also "Heaviside units") -- Miguel

As for the "gaussian" issue: I wasn't sure about this one since I didn't know there was a computer program with the same name, so I did a little Googling. Found out that <guassian> got 2/3rds of a million hits and <gaussian "computer program"> got less than 1% that number of hits.

Searching for "Gaussian computational chemistry" gives 16200 hits :-) (Marj)

This tells me that Gaussian the computer program is far less widely known of than gaussian the variable -- thus confirming my first reaction. Since one useage is far more widely known and expected than the other we should have an article titled gaussian that is only about the mathematics term. A link to either Gaussian computer program or Gaussian (computer program) can then be placed at the bottom of that page (in the same way as Paris, Texas is linked at the bottom of the Paris entry -- which is about Paris, France BTW). This is what I like to call 'weak disambiguation'.

Not sure what the name of the article about the computer program should be... Would it sound odd to use "Gaussian computer program" in a sentence talking about Gaussian the computer program? Or is this computer program almost always referred to simply as "Gaussian"?

Gaussian is produced by Gaussian, Inc (http://www.gaussian.com/) who refer to it as simply "Gaussian". Their website looks like they have more of an academic than a big-corporation mindset, however - like, I didn't notice whether they'd trademarked this use of "Gaussian" (if they in fact could have). Among computational chemists, I've always heard it referred to as "Gaussian", but there wasn't any ambiguity, since they were talking about computational chemistry. Probably the program makes use of the mathematical species of "Gaussian", or "gaussian". ;-) -- Marj Tiefert, Wednesday, May 8, 2002

You can see "the mathematical species of Gaussian" in gaussian.com's logo. -- Miguel

This is important since a major part of our naming conventions deals with easy-linking and whenever a disambiguation issue like this arises, we first really should look for alternatives that are also widely used yet less ambigous. Who wants to have to write [[Guassian (computer program)|Gaussian]] each time they link to that article? However, if the use of the term "Gaussian computer program" makes for contrived and odd sounding sentences then we might just as well place that article at [[Guassian (computer program)]] so as not to needlessly imply that "computer program" is part of its name.. The use of parentheses in disambiguation is is what I like to call 'strong disambiguation' and is something to be used only as a last resort. Hope this helps.

BTW, I'm still not sure about a general rule for capitalizing units that are derived from surnames... As it is, I am beginning to lean in favor of making them lowercase. However, we might want to explore whether there might be any exceptions where a capitalized term would be used. For examle the unit newton is commonly expressed in lowercase form, but then Celsius is usually shown with a capital 'C' (along with the other two common temperature scales).... Any other thoughts?--maveric149

Celcius and Fahrenheit obey the rule because they are strictly 'degrees Celcius' and 'degrees Fahrenheit', where the first word of the unit is not capitalised. When written explicitly, something like 'ten kelvin' should be written with a lowercase 'k'.

Miguel, I don't think the first and the second condition given in the article are equivalent. Take for instance X=(X₁,X₂) where X₁ is standard normal and X₂ is uniform on [0,1]. Then the first condition is not satisfied, but the second is, using the matrix A = (1 0). I claim A needs to be square (and will then automatically be invertible.) --AxelBoldt

You're right. Thanks for pointing that out. I reversed the relation between X and Z. The result, with a rectangular A, is correct. The reason the original Z=A(X-μ) doesn't work is that the covariance matrix of Z doesn't have the right rank. If Z=A X and the covariance matrix of X is Γ, then the covariance matrix of Z must be AΓA^T. But the rank of this is at most the rank of Γ and we are requiring the components of Z to be independent N[0,1]. That's why Z needs to have a smaller dimension. But, as you point out, this doesn't work either.

As far as the current statement goes, the number of components of Z could be arbitrarily large, but not smaller than the rank of Γ. Miguel

We still have serious problems with the definition here. First, do we consider a variable that's constant 0 to be normally distributed? If not, then the first two statements are not equivalent. Also, in the third statement, should we go to a positive semidefinite Γ? AxelBoldt 06:13 Jan 24, 2003 (UTC)

We definitely need to consider a constant (not only 0) to be normally distributed (with variance 0, of course), and we need to eliminate the words "unless all a_i are 0". The reason is that we need to allow singular variance matrices, and once that happens we have some nonzero linear combinations of non-degenerate normals adding up to to a constant. Example: the residuals (which are not independent, and must not be confused with the errors, which are independent) from the simplest sort of ordinary linear regression are constrained to lie within a space of codimension 2. That vector of residuals has a singular variance matrix. The distribution of its sum of squares is chi-square with n-2 degrees of freedom. The whole discussion leading to that conclusion would be horribly complicated if we're forbidden to speak of normal distributions whose variance is a singular matrix. Michael Hardy 17:19 Jan 24, 2003 (UTC)

Shouldn't

there is a vector μ=(μ₁,...,μ_n) and a symmetric, positive semidefinite matrix Γ such that X has density

f_X(x₁,...,x_n)dx₁...dx_n = (det(2πΓ))^-1/2 exp ½((X-μ)^TΓ^-1(X-μ)) dx₁...dx_n

be

there is a vector μ=(μ₁,...,μ_n) and a symmetric, positive definite matrix Γ such that X has density

f_X(x₁,...,x_n)dx₁...dx_n = (det(2πΓ))^-n/2 exp ½((X-μ)^TΓ^-1(X-μ)) dx₁...dx_n

(semidefinite -> definite, 1 -> n) or should I stick to things I know something about? — user:192.38.66.188

positive semidefinite means that we are allowing zero variance (i.e., a random variable that always takes the same value). See the discussion just above your question.

The determinant of Γ takes into account the variances and covariances of all variables, and so it need not be raised to the nth power.

Last but not least, if you know enough to ask these questions, you actually "know something about" this ;-) — Miguel 17:44, 2004 Feb 24 (UTC)

I agree with the non-logged-in user's criticism. Multivariate normal distributions exist in which the variance is a positive semi-definite matrix of determinant zero. In a coordinate system in which the components are independent, one or more components has variance zero. But: such a distribution has no density with respect to the usual n-dimensional Lebesgue measure; no density function should be attributed to such distributions unless it is with respect to a measure on a space of lower dimension. Michael Hardy 21:07, 24 Feb 2004 (UTC)

You're completely right, as usual :-) Miguel 21:24, 2004 Feb 24 (UTC)

proposed rearranged first section

I propose the following rearrangement and partial rewrite of the intro section of this article. The main motivation is that the general case can be understood at an informal level without the need to be familiar with characteristic functions. Comments, please. --Zero 12:37, 15 Sep 2004 (UTC)

Since the definition you single out applies only to non-degenerate multivariate normals, you need to mention degeneracy explicitly in the following paragraph.

I would call X a "random vector", not a "random variable".

Make the paragraphs after "A formal definition" the first section of the body of the article, called "Formal definition".

IMHO, the most intuituvely compelling and informally understandable definition is the one that says every linear combination of the coordinates is normally distributed.

—Miguel 19:59, 2004 Sep 15 (UTC)

In probability theory and statistics, a multivariate normal distribution, also sometimes called a multivariate Gaussian distribution in honor of Carl Friedrich Gauss, is a generalization of the normal distribution to several dimensions.

In the case of a random variable X with a non-degenerate multivariate normal distribution, there is a vector $\mu$ and a symmetric, positive definite matrix $\Sigma$ such that X has density

f_{X}(x_{1},\ldots ,x_{n})\,dx_{1}\ldots dx_{n}={\frac {1}{(2\pi )^{n/2}|\Sigma |^{1/2}}}\exp \left(-{\frac {1}{2}}({\mathbf {x} }-{\mathbf {\mu } })^{T}{\mathbf {\Sigma } }^{-1}({\mathbf {x} }-{\mathbf {\mu } })\right)dx_{1}\ldots dx_{n},

where $\left|A\right|$ is the determinant of $A$ . Note how the equation above reduces to that of the univariate normal distribution if $\Sigma$ is a $1\times 1$ matrix (ie a real number).

More generally, a multivariate normal distribution in $n$ dimensions consists of a non-degenerate multivariate normal distribution sitting inside some $m$ -dimenional affine subspace (a linear subspace possibly shifted from the origin) for some $m\leq n$ . For example, if Z is a 1-dimensional normal distribution, then the vector (Z,Z) whose components are equal has a multivariate normal distribution which sits inside the subspace $\{(x,y)|x=y\}$ .

A formal definition is that an n-dimensional random variable X= X₁, ... , X_n has a multivariate normal distribution, if it satisfies the following equivalent conditions:

every linear combination Y=a₁X₁ + ... + a_nX_n is normally distributed;

there is a random vector Z=(Z₁, ..., Z_m), whose components are independent standard normal random variables, a vector μ = (μ₁, ..., μ_n) and an n×m matrix A such that X = A Z + μ.

there is a vector μ and a symmetric, positive semi-definite matrix Γ such that the characteristic function of X is

φ_X(u)=exp(iμ^Tu − (½) u^T Γ u).

The vector μ in these conditions is the expected value of X and the matrix ${\mathbf {\Sigma } }={\mathbf {A} }{\mathbf {A} }^{T}$ is the covariance matrix of the components X_i.

Note that the X_i are in general not independent; they can be seen as the result of applying the linear transformation A to a collection of independent Gaussian variables Z.

What is the N at the end in the kullback-leibler distance. It would make sense to add what the value N signifies in the formulae. And where does the formuale come form, any references too would help

The N is the dimension of both Multivariate normal distribution, as defined above. But I will make it more clear. Unfortunately, I dont found any reference yet, whether the formular is correct.

A counterexample

Would it not make the article clearer to merge "A counterexample" and "correlation and independence" into one section? — ciphergoth 07:09, 2005 Apr 29 (UTC)

I second this suggestion. The subsection A Counterexample does not have any context. A counterexample to what? I suggest moving "A counterexample" into Correlations and independence. 192.31.106.35 (talk) 18:07, 21 August 2008 (UTC)[reply]

That section states explicitly what it's a counterexample to. Read the first sentence in the section. It's there. Michael Hardy (talk) 15:03, 4 May 2009 (UTC)[reply]

Are we sure X and Y are uncorrelated if Y=X/Y=-X in this way? —Preceding unsigned comment added by 74.74.172.43 (talk) 04:47, 25 February 2010 (UTC)[reply]

Singular variance case

I don't think it is important to include the singular variance case as part of the definition here. The fact that it is useful in certain circumstances does not provide adequate justification. It will be, I believe, unimportant to nearly all people reading this wiki article. As an analogy, the fact that adding the point at infinity to the complex plane simplifies much of the discussion of complex analysis does not justify defining complex numbers themselves as including this point. Or consider the discussion on Dirac_delta; nowhere is it suggested that Dirac_delta is a type of normal distribution. Further, the more general definition is difficult to understand due to the lack of adequate back-up in other wikipedia pages. Consider the Probability_distribution, where it says "Additionally, some authors define a distribution generally as the probability measure induced by a random variable X on its range - the probability of a set B is P(X^{-1}(B))." This makes it seem that X = AZ + \mu is a random variable on the range of X, which is not all of R^N for the singular variance case. Is X is not a random variable on R^N? I understand that you can describe a measure so that the singular case makes sense, but this can be done for many distributions, e.g. the Cauchy distribution. Should the (for example univariate) Cauchy distribution be defined as a random variable on R^N having characteristic function \phi(u; x0, \gamma, v)=exp(i x_0 <v,u> - \gamma |<v,u>|) for some unit vector v? No login, Fri Jan 12 11:50:13 EST 2007.

Transposed?

Confused by the Transposed on this page. If lambda is a row vector (1xn) then lambda^T Sigma^{-1} does not seem to be defined as it would be (nx1)(nxn). 158.64.77.88 09:53, 14 March 2007 (UTC)Ulrich[reply]

The usual convention is that vectors are columns unless otherwise specified. --Zero^talk 10:49, 14 March 2007 (UTC)[reply]

Shouldn't then lambda=(lambda_1, ...)^T, X=...^T etc... (Is this Nitpicking??) 158.64.77.88 12:52, 14 March 2007 (UTC)Ulrich[reply]

You seem to be right. Anyone disagree? --Zero^talk 13:21, 14 March 2007 (UTC)[reply]

I've fixed the transposing errors in the "General case" section. Please check if there are others to be corrected! Oli Filth 19:38, 16 March 2007 (UTC)[reply]

Online calculator malfunction

The Online real-time Bivariate Normal Distribution Calculator, by Razvan Pascalau, Univ. of Alabama, as referenced at the end of this article doesn't seem to work fully.

For instance, enter x=-4, y=2, $\rho$ =0 and the probability comes out as p=-0.01. 163.156.240.17 15:10, 13 July 2007 (UTC)[reply]

The curse of ...

The dimension is denoted $N$ in some places and $n$ in others. It should be consistent throughout the article. Does anyone have a preference on which one to use? Steve8675309 02:38, 24 July 2007 (UTC)[reply]

recent revert

Hello. Sorry to revert a good-faith edit, but the original was very clearly non-gaussian and the newer version was indeed bivariate Gaussian, albeit degenerate. Best wishes, Robinh 07:41, 17 September 2007 (UTC)[reply]

Question 2 on Bivariate Normal

What happens to joint distribution if correlation coefficient ρ_xy is 1 ? Thanks to everybody helping Abayirli1 03:44, 31 October 2007 (UTC)[reply]

Then you have a degenerate case, the covariance matrix is singular and you don't have a regular density, but you need a Dirac delta if you want to write a density fuctional.

Specifically, if the correlation coefficient is 1 then you can show there is an affine linear combination ax + by = c. This is related to the Cauchy-Schwartz inequality where if the inequality is saturated the two vectors are proportional to each other. But a linear combination of random variables is a random variable, it just so happens that ax + by is a constant random variable, with zero variance and doesn't have a pdf (unless you allow a dirac delta). The "orthogonal" direction a y - b x carries all the variance of this singular bivariate normal.

I hope that makes sense. Miguel 10:29, 8 November 2007 (UTC)[reply]

Script N

I think that the notation for the normal distribution

${\mathcal {N}}\quad {\textrm {vs}}\quad N$

should be standardized within this article and that for the normal distribution. It should probably be the latter N because it seems more common. 192.91.171.42 (talk) 23:43, 21 August 2008 (UTC)[reply]

or just Normal(blah, blah) could be used. Given the wide range of people who read this, it is usually best to error on the conservative side (even if it means writing it out more). I would say the same for other distributions such as Gamma(blah) and Beta(blah), etc. —Preceding unsigned comment added by 159.53.78.144 (talk) 14:21, 18 December 2008 (UTC)[reply]

Question 3 on Bivariate Normal

If two variables X and Y are normal, and they are correlated, does that imply they are bivariate normal? —Preceding unsigned comment added by Humble2000 (talk • contribs) 05:36, 5 December 2008 (UTC)[reply]

No. Only if infinitely countable linear combinations of X and Y are normal then [X,Y] is bivariate normal. By definition. Unless they are independent, of course. Omrit (talk) 09:50, 19 April 2009 (UTC)[reply]

Omrit, I think your answer is confused. "Infinitely countable linear combitations" sounds like "countably infinite linear combinations", which means things like

\sum _{n=1}^{\infty }c_{n}X_{n}.

Maybe you meant only if every linear combination of the two is normal. That idea certainly does not involve any mention of countability. Michael Hardy (talk) 15:01, 4 May 2009 (UTC)[reply]

Question 4 on Bivariate normal

If the sum of two normally distributed random variables is still normal, does that imply the two random variables are bivariate normal? —Preceding unsigned comment added by Humble2000 (talk • contribs) 06:26, 5 December 2008 (UTC)[reply]

Of course not. Only if infinitely countable linear combinations of the two variables is normal then they are bivariate normal. By definition. Omrit (talk) 09:42, 19 April 2009 (UTC)[reply]

It is not true that the sum of two normally distributed random variables is in every case normal.

What is true is that if the pair is bivariate normal, then the sum is normal. The converse—that if the two are separately normal, then the pair is bivariate normal—is false. A counterexample can be found at normally distributed and uncorrelated does not imply independent.

Omrit, I think your answer is confused. "Infinitely countable linear combitations" sounds like "countably infinite linear combinations", which means things like

\sum _{n=1}^{\infty }c_{n}X_{n}.

Maybe you meant only if every linear combination of the two is normal. That idea certainly does not involve any mention of countability. Michael Hardy (talk) 14:59, 4 May 2009 (UTC)[reply]

Minor nitpick on "Drawing values from the distribution"

Quoting the article, "Compute the Cholesky decomposition of Σ, that is, find the unique lower triangular matrix $A$ such that $A\,A^{T}=\Sigma$ . Any other matrix $A$ for which this equation holds is also feasible." Since the Cholesky decomposition is indeed unique, the second sentence seems redundant. Ryg (talk) 11:22, 11 March 2009 (UTC)[reply]

Now rephrased. Melcombe (talk) 10:48, 12 March 2009 (UTC)[reply]

Definition of "Jointly Normal"?

Does the phrase "Jointly Normal" mean the same thing as "Multivariate Normal". If so should the page say that somewhere? —Preceding unsigned comment added by 202.89.170.248 (talk) 06:53, 10 June 2009 (UTC)[reply]

It does, but it's used slightly differently. Usually you say that a vector x = (x, y) has a multivariate normal distribution, but that the scalars x and y are jointly normally distributed. In the first case you have a single (vector) random variable which follows a MVN distribution, in the second case you look at the same thing and instead see it as two (scalar) random variables which happen to have something in common when considered together, namely being jointly normally distributed. Same thing, slightly different point of view. You can also say that the joint distribution of x and y is a multivariate normal distribution; maybe "jointly normal" is short for "the joint distribution being (multivariate) normal" or something like that (I'm not a linguist). -- Coffee2theorems (talk) 14:21, 6 July 2009 (UTC)[reply]

Distribuition of norms of residuals/errors?

Suppose I take n samples from a multivariate normal distribution and look a the distribution of the norm of their error. Similarly, suppose I have a zero-mean multivariate normal distribution and look at the distribution of norms of samples. Obviously for a uni-variate normal distribution, the norm s are distributed like the right half of a normal distribution (or is it a folded normal distribution?). But for bi-variate distributions, a sample will almost certainly not have both components near zero, so the distribution of the norm will look more like a Poisson distribution, with no counts at zero norm, then a peak, then a tail. As you add more dimensions, this continues and you seem to get some thing that looks increasingly normal.

I ask because I am looking at norms of multivariate residuals and trying to figure out how likely each one is. With uni-variate residuals, most are nearly zero, and so normalizing by their RMS seems right. Maybe I just answered my own question... Thoughts? —Ben FrantzDale (talk) 16:19, 29 July 2009 (UTC)[reply]

I think you're looking for the chi distribution. It's the distribution of the norm of a standard MVN variate, or distribution of the square root of a chi-square random variate if you're wondering about the name.

The chi-square distribution tends to normal due to the central limit theorem, and while it does so its bulk mass moves to the right (mean being n). Since the square root function tends to a flat line on the right (derivative tends to 0), it transforms the shape of the bulk mass less and less as n increases, which would explain why the chi distribution is also close to normal for reasonable values of n.

FWIW, such questions are more suitable for the reference desk, people are much quicker to respond there. -- Coffee2theorems (talk) 08:15, 17 August 2009 (UTC)[reply]

Affine transformation doubt

I think that the vector c in this section should have dimensions of Mx N, instead of N x 1, since the product BX should have dimension M x N. Anyone agree? —Preceding unsigned comment added by 200.14.47.121 (talk) 20:58, 1 February 2010 (UTC)[reply]

200.14.47.121, (1) Y is a vector so if Y = c + ... then c must be a vector. (2) B is m x n but X is a vector, so BX is a vector. 0¹⁸ (talk) 22:07, 1 February 2010 (UTC)[reply]

But I think that Y is NOT a vector instead it's a matrix, the same goes for X. If you look the dimensions implied in the variance of X: B x \sigma x B', you can tell that X is a matrix with dimension NxN. Thus, c and Y should be a matrix as well. —Preceding unsigned comment added by 186.80.188.218 (talk) 01:18, 2 February 2010 (UTC)[reply]

According to the article,

X\ \sim {\mathcal {N}}(\mu ,\Sigma ),

. This implies, (1) The variance-covariance of X is \Sigma, (2) If X is a matrix then I guess

\mu

is a matrix, but what is

\Sigma

? 0¹⁸ (talk) 01:35, 2 February 2010 (UTC)[reply]

This is an article about the vector normal distribution. So X is an n×1 vector, μ = E[X] is also an n×1 vector, and Σ = E[(X−μ)(X−μ)′] is an n×n variance matrix of vector X. Similarly, both c and Y are m×1 vectors, B is an m×n transformation matrix, and the variance of r.v. Y is equal to BΣB′, which is again an m×m matrix. If you are interested in the case when X is a matrix, see the “matrix normal distribution” page. … stpasha » 05:29, 2 February 2010 (UTC)[reply]

Transpose vector notation

Is there a reason why within the Definition section and in the top table a transposed vector is denoted with u' instead of u^T ? The remaining of the article seems to use always the proper notation u^T. --Marra (talk) 05:44, 1 May 2010 (UTC)[reply]

Fisher information

The formula stated in the “Fisher information matrix” section is wrong. That is, it might be technically correct for the parametric submodel N(μ(θ), Σ(θ)), but this submodel is only distantly related to the family of distributions that are the subject of this article. The Fisher information for the multivariate normal N(μ, Σ) distribution is the expected Hessian of the log-density with respect to the unknown parameter which is (μ, Σ). // stpasha » 22:27, 3 May 2010 (UTC)[reply]

This is not a submodel, it is a generalization: You can define θ to be a vector of any length you want, and if you want it can simply be a trivial parametrization where each of the elements of μ and Σ are free parameters. Anyway, I agree that this doesn't belong here, and anyway already appears in Fisher information matrix#Multivariate normal distribution, so I'll remove it (and place a link). --Zvika (talk) 13:39, 6 July 2010 (UTC)[reply]

Can the Vectors be in bold

The capital sigma is ambiguous (summation and not covariance matrix springs to mind), I checked a textbook but has it spelt out and wolfram.com calls it V, can it be changed to lowercase sigma in bold? Can µ, being a vector, be bold too? or does this rule not apply? --Squidonius (talk) 20:24, 3 June 2010 (UTC)[reply]

Higher order moments

There are no references in here, could that be addressed please. Also the examples are not exactly straightforward to follow, going from 6th order to 4th order, but the major issue is the lack of reference, i.e., to a book that shows how to do this. Thanks. —Preceding unsigned comment added by 145.18.152.249 (talk) 15:15, 21 June 2010 (UTC)[reply]

Cleanup / New headings?

This article should be better organized. Any suggestions? Some material should perhaps be deleted. Ulner (talk) 22:16, 7 July 2010 (UTC)[reply]

N vs. k

both $N$ and $k$ are used for dimension. can we pick one and stick with it? —Preceding unsigned comment added by 69.170.40.90 (talk) 22:07, 27 September 2010 (UTC)[reply]

Sure, go for it. I'd pick k, because N might be the normal distribution. However, in one section there is a matrix, so it needs two dimensions and NxM makes sense there. 0¹⁸ (talk) 23:03, 27 September 2010 (UTC)[reply]

please clarify this

There exists a random ℓ-vector Z, whose components are independent normal random variables, a k-vector μ, and a k×ℓ matrix A, such that X = AZ + μ. Here ℓ is the rank of the covariance matrix Σ = AA′. If the covariance matrix is of full rank, then the linear operator A is simply ______.

Please add the sentence in bold with correct information, thank you.

Why nonnegative definite instead of positive semidefinite?

Aren't the two terms equivalent, but the latter (positive semidefinite) far more common? — Preceding unsigned comment added by Lleeoo (talk • contribs) 00:54, 19 January 2011 (UTC)[reply]

They are the same, and I would vote for changing it to positive semidefinite, except that doesn't it need to be positive definite, not just positive semidefinite? If $x^{T}\Sigma x=0$ for some x, then some eigenvalue of Σ must be 0, so the matrix isn't invertible. If the matrix isn't invertible, then the pdf formula is undefined. So....shouldn't we change it to positive definite? BlueScreenD (talk) 04:23, 7 February 2011 (UTC)[reply]

Bloated definition of multivariate normal

I feel that the given definition of the multivariate normal distribution is unnecessarily bloated. The current definition section has four bullet points plus a paragraph with an afterthought about the covariance matrix. A definition should be a minimal set of axioms that fully describe something. In this case, I think the best minimal set of axioms is the formula for the probability density function. The other bullet points should be moved to the "Properties" section. Thoughts? BlueScreenD (talk) 04:07, 7 February 2011 (UTC)[reply]