Random subspace method

Random subspace method ^[1] (or attribute bagging^[2]) is an ensemble classifier that consists of several classifiers each operating in a subspace of the original feature space, and outputs the class based on the outputs of these individual classifiers. Random subspace method has been used for decision trees (random decision forests),^[3]^[1] linear classifiers,^[4] support vector machines,^[5] nearest neighbours^[6] and other types of classifiers. This method is also applicable to one-class classifiers.^[7]^[8]

The algorithm is an attractive choice for classification problems where the number of features is much larger than the number of training objects, such as fMRI data^[9] or gene expression data.^[10]

In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. Feature selection techniques are used for three reasons:

simplification of models to make them easier to interpret by researchers/users,[1]
shorter training times,
enhanced generalization by reducing overfitting[2](formally, reduction of variance[1])

The central premise when using a feature selection technique is that the data contains many features that are either redundant or irrelevant, and can thus be removed without incurring much loss of information.[2] Redundant or irrelevant features are two distinct notions, since one relevant feature may be redundant in the presence of another relevant feature with which it is strongly correlated.[3]

References

^ ^a ^b Ho, Tin Kam (1998). "The Random Subspace Method for Constructing Decision Forests" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 20 (8): 832–844. doi:10.1109/34.709601.
^ Bryll, R. (2003). "Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets". Pattern Recognition. 36 (6): 1291–1302. doi:10.1016/s0031-3203(02)00121-8.
^ Ho, Tin Kam (1995). Random Decision Forest (PDF). Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, 14–16 August 1995. pp. 278–282.
^ Skurichina, Marina (2002). "Bagging, boosting and the random subspace method for linear classifiers". Pattern Analysis and Applications. 5 (2): 121–135. doi:10.1007/s100440200011.
^ Tao, D. (2006). "Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval". IEEE Transactions on Pattern Analysis and Machine Intelligence. doi:10.1109/tpami.2006.134.
^ Tremblay, G. (2004). "Optimizing Nearest Neighbour in Random Subspaces using a Multi-Objective Genetic Algorithm" (PDF). 17th International Conference on Pattern Recognition: 208–211.
^ Nanni, L. (2006). "Experimental comparison of one-class classifiers for online signature verification". Neurocomputing. 69 (7).
^ Cheplygina, Veronika, "Pruned random subspace method for one-class classifiers", Multiple Classifier Systems 2011 (PDF), pp. 96–105 {{citation}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
^ Kuncheva, Ludmila; et al. (2010). IEEE Transactions on Medical Imaging. 29 (2): 531–542 http://pages.bangor.ac.uk/~mas00a/papers/lkjrcpdlsjtmi10.pdf. {{cite journal}}: |contribution= ignored (help); Missing or empty |title= (help)
^ Bertoni, Alberto; Folgieri, Raffaella; Valentini, Giorgio (2005). "Bio-molecular cancer prediction with random subspace ensembles of support vector machines". Neurocomputing. 63: 535–539. doi:10.1016/j.neucom.2004.07.007.

[ho1998-1] Ho, Tin Kam (1998). "The Random Subspace Method for Constructing Decision Forests" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 20 (8): 832–844. doi:10.1109/34.709601.

[2] Bryll, R. (2003). "Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets". Pattern Recognition. 36 (6): 1291–1302. doi:10.1016/s0031-3203(02)00121-8.

[ho1995-3] Ho, Tin Kam (1995). Random Decision Forest (PDF). Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, 14–16 August 1995. pp. 278–282.

[4] Skurichina, Marina (2002). "Bagging, boosting and the random subspace method for linear classifiers". Pattern Analysis and Applications. 5 (2): 121–135. doi:10.1007/s100440200011.

[5] Tao, D. (2006). "Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval". IEEE Transactions on Pattern Analysis and Machine Intelligence. doi:10.1109/tpami.2006.134.

[6] Tremblay, G. (2004). "Optimizing Nearest Neighbour in Random Subspaces using a Multi-Objective Genetic Algorithm" (PDF). 17th International Conference on Pattern Recognition: 208–211.

[7] Nanni, L. (2006). "Experimental comparison of one-class classifiers for online signature verification". Neurocomputing. 69 (7).

[8] Cheplygina, Veronika, "Pruned random subspace method for one-class classifiers", Multiple Classifier Systems 2011 (PDF), pp. 96–105 {{citation}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

[9] Kuncheva, Ludmila; et al. (2010). IEEE Transactions on Medical Imaging. 29 (2): 531–542 http://pages.bangor.ac.uk/~mas00a/papers/lkjrcpdlsjtmi10.pdf. {{cite journal}}: |contribution= ignored (help); Missing or empty |title= (help)

[10] Bertoni, Alberto; Folgieri, Raffaella; Valentini, Giorgio (2005). "Bio-molecular cancer prediction with random subspace ensembles of support vector machines". Neurocomputing. 63: 535–539. doi:10.1016/j.neucom.2004.07.007.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]