Discretization of continuous features
Appearance
In statistics and machine learning, discretization refers to the process of converting or partitioning continuous attributes, features or variables to discretized or nominal attributes/features/variables/intervals. This can be useful when creating probability mass functions – formally, in density estimation. It is a form of binning, as in making a histogram.
Typically data is discretized into partitions of K equal lengths/width (equal intervals) or K% of the total data (equal frequencies).[1]
Some mechanisms for discretizing continuous data include:
- Fayyad & Irani's MDL method[2] - Uses Information Gain to recursively define the best bins.
- And many more[3]
Many Machine Learning algorithms are known to produce better models by discretizing continuous attributes[4]
See also
References
- ^ "Entropy and MDL Discretization of Continuous Variables for Bayesian Belief Networks" (PDF). Retrieved 2008-07-10.
- ^ "Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning". hdl:2014/35171.
{{cite web}}
: Missing or empty|url=
(help) - ^ "Supervised and Unsupervised Discretization of Continuous Features". Retrieved 2008-07-10.
- ^ "S. Kotsiantis, D. Kanellopoulos, Discretization Techniques: A recent survey, GESTS International Transactions on Computer Science and Engineering, Vol.32 (1), 2006, pp. 47-58" (PDF).