Jump to content

Discretization of continuous features

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Jason.surratt (talk | contribs) at 14:25, 19 February 2009 (Created page with '= Discretization of continuous features = In statistics and machine learning, '''discretization''' refers to the process of converting continuous features or v…'). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Discretization of continuous features

In statistics and machine learning, discretization refers to the process of converting continuous features or variables to discretized or nominal features. This can be useful when creating probability mass functions.

Typically data is discretized into partitions of K equal lengths (equal intervals) or K% of the total data (equal frequencies). [1]

Some mechanisms for discretizing continuous data include:

  • Fayyad & Irani's MDL method [2] - Uses Information Gain to recursively define the best bins.
  • And many more [3]

References

  1. ^ "Entropy and MDL Discretization of Continuous Variables for Bayesian Belief Networks" (PDF). Retrieved 2008-07-10.
  2. ^ "Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning". Retrieved 2008-07-10.
  3. ^ "Supervised and Unsupervised Discretization of Continuous Features". Retrieved 2008-07-10.