Jump to content

Probability distribution fitting

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Asitgoes (talk | contribs) at 20:57, 7 June 2012 (New article). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Probability distribution fitting or simply distribution fitting is the fitting of a probability distribution to a series of data concerning the repeated measurement of a variable phenomenon.

There are many probability distributions (see list of probability distributions) of which some can be fitted more closely to the data than others, depending on the characteristics of the phenomenon. In distribution fitting one needs to select a distribution that suits the data well.

Selection

When the data are symmetrically distributed around the mean while the frequency of occurrence of data farther away from the mean diminishes, one may select for example the normal distribution, the logistic distribution, the Student's t-distributionor the Fréchet distribution. The first two are very similar, while the last, with one degree of freedom, has "heavier tails" meaning that the values farther away from the mean occur relatively more often (i.e. the kurtosis is higher.)

When the larger values tend to be farther away from the mean than the smaller values, one has a skew distribution to the right (i.e. positive skewness) and one may select the lognormal distribution, the loglogistic distribution, the Gumbel distribution, the exponential distribution, the Pareto distribution or the Weibull distribution.

For distributions that are skew to the left (i.e. negative skewness) and the larger values tend to be nearer to the mean than the smaller ones, one may select the squarenormal distribution (i.e. the normal distribution applied to the square of the data values) or the Gompertz distribution.

Techniques

The following techniques of distribution fitting exist: [1]

For example, the parameter μ (the expectation) can be estimated by the mean of the data and the parameter σ 2 (the variance) can be estimated from the standard deviation of the data. The mean is found as μ = Σ(X) / n, where X is the data value and n the number of data, while the standard deviation is calculated as σ = Σ(X-μ)2 / (n-1)
Fitted cumulative Gumbel distribution to maximum one-day October rainfalls with confidence band using cumfreq
For example, the cumulative Gumbel distribution can be linearized to Y = aX+b, where X is the data variable and Y = -ln(-lnP), with P being the cumulative probability, i.e. the probability that the data value less than X. Thus, using the plotting position for P, one finds the parameters a and b from a linear regression of Y on X.

Software

Distribution fitting by hand is a tedious job, let alone to find the best fitting probability distribution. Software may help to alleviate the task. The following software may be used:

  • CumFreq [5]
  • Easy fit [6]
  • MathWorks Benelux [7]
  • ModelRisk [8]
  • Ricci distributions [9]
  • Risksolver [10]
  • StatSoft distribution fitting [11]

See also

Cumulative frequency analysis

References

  1. ^ Frequency and Regression Analysis. Chapter 6 in: H.P.Ritzema (ed., 1994), Drainage Principles and Applications, Publ. 16, pp. 175−224, International Institute for Land Reclamation and Improvement (ILRI), Wageningen, The Netherlands. ISBN 90 70754 3 39 . Free download from the from webpage [1] under nr. 12, or directly as PDF : [2]
  2. ^ H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946)
  3. ^ Hosking, J.R.M. (1990). "L-moments: analysis and estimation of distributions using linear combinations of order statistics". Journal of the Royal Statistical Society, Series B. 52: 105–124. JSTOR 2345653.
  4. ^ Aldrich, John (1997). "R. A. Fisher and the making of maximum likelihood 1912–1922". Statistical Science. 12 (3): 162–176. doi:10.1214/ss/1030037906. MR 1617519.
  5. ^ CumFreq, cumulative frequency analysis [3]
  6. ^ Easy fit, Data analysis & simulation [4]
  7. ^ MathWorks Benelux [5]
  8. ^ ModelRisk risk modelling software [6]
  9. ^ Vito Ricci, 2005, Fitting distributions with R [7]
  10. ^ Automatically fit distributions and parameters to samples [8]
  11. ^ StatSoft distribution fitting [9]