Probability distribution fitting
Probability distribution fitting or simply distribution fitting is the fitting of a probability distribution to a series of data concerning the repeated measurement of a variable phenomenon.
There are many probability distributions (see list of probability distributions) of which some can be fitted more closely to the data than others, depending on the characteristics of the phenomenon. In distribution fitting one needs to select a distribution that suits the data well.
Selection
When the data are symmetrically distributed around the mean while the frequency of occurrence of data farther away from the mean diminishes, one may select for example the normal distribution, the logistic distribution, the Student's t-distributionor the Fréchet distribution. The first two are very similar, while the last, with one degree of freedom, has "heavier tails" meaning that the values farther away from the mean occur relatively more often (i.e. the kurtosis is higher.)
When the larger values tend to be farther away from the mean than the smaller values, one has a skew distribution to the right (i.e. positive skewness) and one may select the lognormal distribution, the loglogistic distribution, the Gumbel distribution, the exponential distribution, the Pareto distribution or the Weibull distribution.
For distributions that are skew to the left (i.e. negative skewness) and the larger values tend to be nearer to the mean than the smaller ones, one may select the squarenormal distribution (i.e. the normal distribution applied to the square of the data values) or the Gompertz distribution.
Techniques
The following techniques of distribution fitting exist: [1]
- Parametric method (or method of moments, by which the parameters of the distribution are calculated from the data series.[2]
For example, the parameter μ (the expectation) can be estimated by the mean of the data and the parameter σ 2 (the variance) can be estimated from the standard deviation of the data. The mean is found as μ = Σ(X) / n, where X is the data value and n the number of data, while the standard deviation is calculated as σ = Σ(X-μ)2 / (n-1)

- Regression method, using a transformation of the cumulative distribution function so that a linear relation is found between the cumulative probability and the values of the data, which may also need to be transformed, depending on the selected probability distribution. In this method the cumulative probability needs to be estimated by the plotting position.
For example, the cumulative Gumbel distribution can be linearized to Y = aX+b, where X is the data variable and Y = -ln(-lnP), with P being the cumulative probability, i.e. the probability that the data value less than X. Thus, using the plotting position for P, one finds the parameters a and b from a linear regression of Y on X.
- Maximum likelihood method [4]
Software
Distribution fitting by hand is a tedious job, let alone to find the best fitting probability distribution. Software may help to alleviate the task. The following software may be used:
- CumFreq [5]
- Easy fit [6]
- MathWorks Benelux [7]
- ModelRisk [8]
- Ricci distributions [9]
- Risksolver [10]
- StatSoft distribution fitting [11]
See also
References
- ^ Frequency and Regression Analysis. Chapter 6 in: H.P.Ritzema (ed., 1994), Drainage Principles and Applications, Publ. 16, pp. 175−224, International Institute for Land Reclamation and Improvement (ILRI), Wageningen, The Netherlands. ISBN 90 70754 3 39 . Free download from the from webpage [1] under nr. 12, or directly as PDF : [2]
- ^ H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946)
- ^ Hosking, J.R.M. (1990). "L-moments: analysis and estimation of distributions using linear combinations of order statistics". Journal of the Royal Statistical Society, Series B. 52: 105–124. JSTOR 2345653.
- ^ Aldrich, John (1997). "R. A. Fisher and the making of maximum likelihood 1912–1922". Statistical Science. 12 (3): 162–176. doi:10.1214/ss/1030037906. MR 1617519.
- ^ CumFreq, cumulative frequency analysis [3]
- ^ Easy fit, Data analysis & simulation [4]
- ^ MathWorks Benelux [5]
- ^ ModelRisk risk modelling software [6]
- ^ Vito Ricci, 2005, Fitting distributions with R [7]
- ^ Automatically fit distributions and parameters to samples [8]
- ^ StatSoft distribution fitting [9]