Discretization of continuous features

In statistics and machine learning, discretization refers to the process of converting or partitioning continuous attributes, features or variables to discretized or nominal attributes/features/variables/intervals. This can be useful when creating probability mass functions – formally, in density estimation. It is a form of binning, as in making a histogram.

Typically data is discretized into partitions of K equal lengths/width (equal intervals) or K% of the total data (equal frequencies).^[1]

Mechanisms for discretizing continuous data include Fayyad & Irani's MDL method^[2], which uses information gain to recursively define the best bins, and many others^[3]

Many machine learning algorithms are known to produce better models by discretizing continuous attributes.^[4]

References

^ Clarke, E. J.; Barton, B. A. (2000). "Entropy and MDL discretization of continuous variables for Bayesian belief networks" (PDF). International Journal of Intelligent Systems. 15: 61. doi:10.1002/(SICI)1098-111X(200001)15:1<61::AID-INT4>3.0.CO;2-O. Retrieved 2008-07-10.
^ Fayyad, Usama M.; Irani, Keki B. (1993) "Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning". hdl:2014/35171. {{cite web}}: Missing or empty |url= (help), Proceedings of the International Joint Conference on Uncertainty in AI (Q334 .I571 1993), pp. 1022-1027
^ Dougherty, J.; Kohavi, R. ; Sahami, M. (1995). "Supervised and Unsupervised Discretization of Continuous Features". In A. Prieditis & S. J. Russell, eds. Work. Morgan Kaufmann, pp. 194-202}}
^ Kotsiantis, S.; Kanellopoulos, D (2006). "Discretization Techniques: A recent survey" (PDF). GESTS International Transactions on Computer Science and Engineering. 32 (1): 47–58.

This statistics-related article is a stub. You can help Wikipedia by expanding it.

[clarke-1] Clarke, E. J.; Barton, B. A. (2000). "Entropy and MDL discretization of continuous variables for Bayesian belief networks" (PDF). International Journal of Intelligent Systems. 15: 61. doi:10.1002/(SICI)1098-111X(200001)15:1<61::AID-INT4>3.0.CO;2-O. Retrieved 2008-07-10.

[2] Fayyad, Usama M.; Irani, Keki B. (1993) "Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning". hdl:2014/35171. {{cite web}}: Missing or empty |url= (help), Proceedings of the International Joint Conference on Uncertainty in AI (Q334 .I571 1993), pp. 1022-1027

[3] Dougherty, J.; Kohavi, R. ; Sahami, M. (1995). "Supervised and Unsupervised Discretization of Continuous Features". In A. Prieditis & S. J. Russell, eds. Work. Morgan Kaufmann, pp. 194-202}}

[4] Kotsiantis, S.; Kanellopoulos, D (2006). "Discretization Techniques: A recent survey" (PDF). GESTS International Transactions on Computer Science and Engineering. 32 (1): 47–58.

[1]

[2]

[3]

[4]

See also

References