Feature extraction

General

Feature extraction involves reducing the number of resources required to describe a large set of data. When performing analysis of complex data one of the major problems stems from the number of variables involved. Analysis with a large number of variables generally requires a large amount of memory and computation power, also it may cause a classification algorithm to overfit to training samples and generalize poorly to new samples. Feature extraction is a general term for methods of constructing combinations of the variables to get around these problems while still describing the data with sufficient accuracy. Many machine learning practitioners believe that properly optimized feature extraction is the key to effective model construction.^[1]

Results can be improved using constructed sets of application-dependent features, typically built by an expert. One such process is called feature engineering. Alternatively, general dimensionality reduction techniques are used such as:

Image processing

One very important area of application is image processing, in which algorithms are used to detect and isolate various desired portions or shapes (features) of a digitized image or video stream. It is particularly important in the area of optical character recognition.

Implementations

Many data analysis software packages provide for feature extraction and dimension reduction. Common numerical programming environments such as MATLAB, SciLab, NumPy, scikit-learn and the R language provide some of the simpler feature extraction techniques (e.g. principal component analysis) via built-in commands. More specific algorithms are often available as publicly available scripts or third-party add-ons. There are also software packages targeting specific software machine learning applications that specialize in feature extraction.^[2]

References

^ "Its all about the features". Reality AI Blog. September 2017.
^ See, for example, https://reality.ai/

Rustum, Rabee, Adebayo Adeloye, and Aurore Simala. "Kohonen self-organising map (KSOM) extracted features for enhancing MLP-ANN prediction models of BOD5." In International Symposium: Quantification and Reduction of Predictive Uncertainty for Sustainable Water Resources Management-24th General Assembly of the International Union of Geodesy and Geophysics (IUGG), pp. 181-187. 2007.

[1] "Its all about the features". Reality AI Blog. September 2017.

[2] See, for example, https://reality.ai/

[1]

[2]

General

Image processing

Implementations

See also

References