In mathematics and statistics, a probability vector or stochastic vector is a vector with non-negative entries that add up to one.

Underlying every probability vector is an experiment that can produce an outcome. To connect this experiment to mathematics, one introduces a discrete random variable, which is a function that assigns a numerical value to each possible outcome. For example, if the experiment consists of rolling a single die, the possible values of this random variable are the integers 1,2,…,6. The associated probability vector has six components, each representing the probability of obtaining the corresponding outcome. More generally, a probability vector of length n represents the distribution of probabilities across the n possible numerical outcomes of a random variable.^[1]

The vector gives us the probability mass function of that random variable, which is the standard way of characterizing a discrete probability distribution.^[2]

Examples

Here are some examples of probability vectors. The vectors can be either columns or rows.^[3]

$x_{0}={\begin{bmatrix}0.5\\0.25\\0.25\end{bmatrix}},$
$x_{1}={\begin{bmatrix}0\\1\\0\end{bmatrix}},$
$x_{2}={\begin{bmatrix}0.65&0.35\end{bmatrix}},$
$x_{3}={\begin{bmatrix}0.3&0.5&0.07&0.1&0.03\end{bmatrix}}.$

Properties

The mean of the components of any probability vector is $1/n$ .^[4]

The Euclidean length of a probability vector is related to the variance of its components by ^[5]

\|p\|={\sqrt {\,n\sigma ^{2}+{\tfrac {1}{n}}\,}}

.

This expression for length reaches its minimum value of ${\tfrac {1}{\sqrt {n}}}$ when all components are equal, with $p_{i}=1/n$ .^[3]
The longest probability vector has the value 1 in a single component and 0 in all others, and has a length of 1.^[3]
The shortest vector corresponds to maximum uncertainty, the longest to maximum certainty.
The variance $\sigma ^{2}$ of a probability vector $p=(p_{1},p_{2},\ldots ,p_{n})$ satisfies:

\sigma ^{2}\in \left[\,0,\,{\tfrac {n-1}{n^{2}}}\,\right].

The lower bound occurs when all components are equal

p_{i}=1/n

, and the upper bound when one component equals

1

and the rest are

0

.^[6]

Significance of the bounds on variance

The bounds on variance show that as the number of possible outcomes $n$ increases, the variance necessarily decreases toward zero. As a result, the uncertainty associated with any single outcome increases because the components of the probability vector become more nearly equal. In empirical work, this often motivates binning the outcomes to reduce $n$ ; although this discards some information contained in the original outcomes, it allows the coarser-grained structure of the distribution to be revealed. The decrease in variance with increasing $n$ reflects the same tendency toward uniformity that underlies entropy in information theory and statistical mechanics.^[7]

Geometry of the probability simplex

The probability simplex is the canonical geometric representation of the general concept of a simplex. A simplex is the simplest geometric object that fully occupies the region of a given dimension defined by its vertices. It is constructed as the convex hull of a set of affinely independent points. In general, an (n − 1)-simplex in $\mathbb {R} ^{n}$ is the set of all convex combinations of n affinely independent vertices $p_{1},p_{2},\ldots ,p_{n}$ :

S=\left\{\,x=\sum _{i=1}^{n}\alpha _{i}p_{i}\;|\;\alpha _{i}\geq 0,\;\sum _{i=1}^{n}\alpha _{i}=1\,\right\}.

Each vertex $p_{i}$ represents one corner of the simplex, and every point inside can be expressed uniquely as a convex combination of the vertices, with coefficients $\alpha _{i}$ known as barycentric coordinates.

The most common and symmetric example is obtained by taking the n standard basis vectors

e_{1}=(1,0,0,\ldots ,0),\quad e_{2}=(0,1,0,\ldots ,0),\quad \ldots ,\quad e_{n}=(0,\ldots ,0,1).

Their convex hull defines the standard or probability simplex:

\Delta _{n-1}=\left\{\,p=(p_{1},\ldots ,p_{n})\in \mathbb {R} ^{n}\;|\;p_{i}\geq 0,\;\sum _{i=1}^{n}p_{i}=1\,\right\},

which is an (n − 1)-dimensional simplex lying on the affine hyperplane $\sum _{i}x_{i}=1$ . The components $p_{i}$ serve directly as barycentric coordinates, giving this simplex an immediate interpretation in probability theory: each vertex corresponds to a certain outcome, and each interior point represents a mixture or distribution over the n outcomes. Every possible discrete probability distribution on n outcomes is represented by exactly one point in this simplex, and conversely each point of the simplex defines a unique distribution. Moving along barycentric coordinates toward one of the vertices corresponds to increasing certainty about the outcome, while movement toward the center of the simplex corresponds to increasing uncertainty that results from a more uniform distribution.

The probability simplex thus serves as the canonical simplex in $\mathbb {R} ^{n}$ . Any other simplex can be obtained from it by an affine transformation, making it the standard reference for geometrical and probabilistic analyses.^[8]^[9]

References

^ Bertsekas, D. P., & Tsitsiklis, J. N. (2008). Introduction to Probability (2nd ed.). Athena Scientific. Available as MIT lecture notes PDF. Chapter 2, p. 3.
^ Jacobs, Konrad (1992), Discrete Stochastics, Basler Lehrbücher [Basel Textbooks], vol. 3, Birkhäuser Verlag, Basel, p. 45, doi:10.1007/978-3-0348-8645-1, ISBN 3-7643-2591-7, MR 1139766.
^ ^a ^b ^c Lee, Geoffrey (2016). "MATH1014 Linear Algebra Lecture 10 Notes" (PDF). Australian National University. Retrieved 16 October 2025.
^ StatisticsHowTo, Probability Vector: Definition, Examples, Properties
^ "Length of a Probability Vector". CrossValidated. 2021. Retrieved 16 October 2025.
^ Bertsekas, D. P. & Tsitsiklis, J. N. (2008). Introduction to Probability. 2nd ed. Athena Scientific. pp. 53–54.
^ Source needed. probably Cover & Thomas
^ Boyd, S. & Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press, §2.1.
^ Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press, §2.2.

[1] Bertsekas, D. P., & Tsitsiklis, J. N. (2008). Introduction to Probability (2nd ed.). Athena Scientific. Available as MIT lecture notes PDF. Chapter 2, p. 3.

[2] Jacobs, Konrad (1992), Discrete Stochastics, Basler Lehrbücher [Basel Textbooks], vol. 3, Birkhäuser Verlag, Basel, p. 45, doi:10.1007/978-3-0348-8645-1, ISBN 3-7643-2591-7, MR 1139766.

[ANU-3] Lee, Geoffrey (2016). "MATH1014 Linear Algebra Lecture 10 Notes" (PDF). Australian National University. Retrieved 16 October 2025.

[StatHowTo-4] StatisticsHowTo, Probability Vector: Definition, Examples, Properties

[5] "Length of a Probability Vector". CrossValidated. 2021. Retrieved 16 October 2025.

[6] Bertsekas, D. P. & Tsitsiklis, J. N. (2008). Introduction to Probability. 2nd ed. Athena Scientific. pp. 53–54.

[7] Source needed. probably Cover & Thomas

[8] Boyd, S. & Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press, §2.1.

[9] Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press, §2.2.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

Probability vector

Examples

Properties

Significance of the bounds on variance

Geometry of the probability simplex

See also

References