Jump to content

Discretization of continuous features

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Melcombe (talk | contribs) at 18:38, 18 February 2012 (details for incomplete citation). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In statistics and machine learning, discretization refers to the process of converting or partitioning continuous attributes, features or variables to discretized or nominal attributes/features/variables/intervals. This can be useful when creating probability mass functions – formally, in density estimation. It is a form of binning, as in making a histogram.

Typically data is discretized into partitions of K equal lengths/width (equal intervals) or K% of the total data (equal frequencies).[1]

Some mechanisms for discretizing continuous data include:

Many Machine Learning algorithms are known to produce better models by discretizing continuous attributes[4]

See also

References

  1. ^ Clarke, E. J.; Barton, B. A. (2000). "Entropy and MDL discretization of continuous variables for Bayesian belief networks" (PDF). International Journal of Intelligent Systems. 15: 61. doi:10.1002/(SICI)1098-111X(200001)15:1<61::AID-INT4>3.0.CO;2-O. Retrieved 2008-07-10.
  2. ^ "Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning". hdl:2014/35171. {{cite web}}: Missing or empty |url= (help)
  3. ^ "Supervised and Unsupervised Discretization of Continuous Features". Retrieved 2008-07-10.
  4. ^ "S. Kotsiantis, D. Kanellopoulos, Discretization Techniques: A recent survey, GESTS International Transactions on Computer Science and Engineering, Vol.32 (1), 2006, pp. 47-58" (PDF).