Random subspace method

Last updated

In machine learning the random subspace method, [1] also called attribute bagging [2] or feature bagging, is an ensemble learning method that attempts to reduce the correlation between estimators in an ensemble by training them on random samples of features instead of the entire feature set.

Contents

Motivation

In ensemble learning one tries to combine the models produced by several learners into an ensemble that performs better than the original learners. One way of combining learners is bootstrap aggregating or bagging, which shows each learner a randomly sampled subset of the training points so that the learners will produce different models that can be sensibly averaged. [lower-alpha 1] In bagging, one samples training points with replacement from the full training set.

The random subspace method is similar to bagging except that the features ("attributes", "predictors", "independent variables") are randomly sampled, with replacement, for each learner. Informally, this causes individual learners to not over-focus on features that appear highly predictive/descriptive in the training set, but fail to be as predictive for points outside that set. For this reason, random subspaces are an attractive choice for high-dimensional problems where the number of features is much larger than the number of training points, such as learning from fMRI data [3] or gene expression data. [4]

The random subspace method has been used for decision trees; when combined with "ordinary" bagging of decision trees, the resulting models are called random forests. [5] It has also been applied to linear classifiers, [6] support vector machines, [7] nearest neighbours [8] [9] and other types of classifiers. This method is also applicable to one-class classifiers. [10] [11] The random subspace method has also been applied to portfolio selection [12] [13] [14] [15] problem showing its superiority to the conventional resampled portfolio essentially based on Bagging.

To tackle high-dimensional sparse problems, a framework named Random Subspace Ensemble (RaSE) [16] was developed. RaSE combines weak learners trained in random subspaces with a two-layer structure and iterative process. [17] RaSE has been shown to enjoy appealing theoretical properties and practical performances. [16]

Algorithm

An ensemble of models employing the random subspace method can be constructed using the following algorithm:

  1. Let the number of training points be N and the number of features in the training data be D.
  2. Let L be the number of individual models in the ensemble.
  3. For each individual model l, choose nl (nl < N) to be the number of input points for l. It is common to have only one value of nl for all the individual models.
  4. For each individual model l, create a training set by choosing dlfeatures from D with replacement and train the model.

Now, to apply the ensemble model to an unseen point, combine the outputs of the L individual models by majority voting or by combining the posterior probabilities.

Footnotes

  1. If each learner follows the same, deterministic, algorithm, the models produced are necessarily all the same.

Related Research Articles

<span class="mw-page-title-main">Boosting (machine learning)</span> Method in machine learning

In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, and also variance in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones. Boosting is based on the question posed by Kearns and Valiant : "Can a set of weak learners create a single strong learner?" A weak learner is defined to be a classifier that is only slightly correlated with the true classification. In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification.

<span class="mw-page-title-main">Machine learning</span> Study of algorithms that improve automatically through experience

Machine learning (ML) is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machines 'discover' their 'own' algorithms, without needing to be explicitly told what to do by any human-developed algorithms. Recently, generative artificial neural networks have been able to surpass results of many previous approaches. Machine learning approaches have been applied to large language models, computer vision, speech recognition, email filtering, agriculture and medicine, where it is too costly to develop algorithms to perform the needed tasks.

<span class="mw-page-title-main">Decision tree learning</span> Machine learning algorithm

Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of observations.

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The expression was coined by Richard E. Bellman when considering problems in dynamic programming.

<span class="mw-page-title-main">Feature selection</span> Procedure in machine learning and statistics

Feature selection is the process of selecting a subset of relevant features for use in model construction. Stylometry and DNA microarray analysis are two cases where feature selection is used. It should be distinguished from feature extraction.

<span class="mw-page-title-main">Bootstrap aggregating</span> Ensemble method within machine learning

Bootstrap aggregating, also called bagging, is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision tree methods, it can be used with any type of method. Bagging is a special case of the model averaging approach.

<span class="mw-page-title-main">Random forest</span> Binary search tree based ensemble machine learning method

Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the mean or average prediction of the individual trees is returned. Random decision forests correct for decision trees' habit of overfitting to their training set. Random forests generally outperform decision trees, but their accuracy is lower than gradient boosted trees. However, data characteristics can affect their performance.

In machine learning, multi-label classification or multi-output classification is a variant of the classification problem where multiple nonexclusive labels may be assigned to each instance. Multi-label classification is a generalization of multiclass classification, which is the single-label problem of categorizing instances into precisely one of several classes. In the multi-label problem the labels are nonexclusive and there is no constraint on how many of the classes the instance can be assigned to.

<span class="mw-page-title-main">Anomaly detection</span> Approach in data analysis

In data analysis, anomaly detection is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behaviour. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.

<span class="mw-page-title-main">Echo state network</span> Type of reservoir computer

An echo state network (ESN) is a type of reservoir computer that uses a recurrent neural network with a sparsely connected hidden layer. The connectivity and weights of hidden neurons are fixed and randomly assigned. The weights of output neurons can be learned so that the network can produce or reproduce specific temporal patterns. The main interest of this network is that although its behaviour is non-linear, the only weights that are modified during training are for the synapses that connect the hidden neurons to output neurons. Thus, the error function is quadratic with respect to the parameter vector and can be differentiated easily to a linear system.

Reservoir computing is a framework for computation derived from recurrent neural network theory that maps input signals into higher dimensional computational spaces through the dynamics of a fixed, non-linear system called a reservoir. After the input signal is fed into the reservoir, which is treated as a "black box," a simple readout mechanism is trained to read the state of the reservoir and map it to the desired output. The first key benefit of this framework is that training is performed only at the readout stage, as the reservoir dynamics are fixed. The second is that the computational power of naturally available systems, both classical and quantum mechanical, can be used to reduce the effective computational cost.

In computer vision, the bag-of-words model sometimes called bag-of-visual-words model can be applied to image classification or retrieval, by treating image features as words. In document classification, a bag of words is a sparse vector of occurrence counts of words; that is, a sparse histogram over the vocabulary. In computer vision, a bag of visual words is a vector of occurrence counts of a vocabulary of local image features.

<span class="mw-page-title-main">Ensemble learning</span> Statistics and machine learning technique

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble consists of only a concrete finite set of alternative models, but typically allows for much more flexible structure to exist among those alternatives.

<span class="mw-page-title-main">Multiclass classification</span> Problem in machine learning and statistical classification

In machine learning and statistical classification, multiclass classification or multinomial classification is the problem of classifying instances into one of three or more classes.

There are many types of artificial neural networks (ANN).

<span class="mw-page-title-main">Active learning (machine learning)</span> Machine learning strategy

Active learning is a special case of machine learning in which a learning algorithm can interactively query a user to label new data points with the desired outputs. In statistics literature, it is sometimes also called optimal experimental design. The information source is also called teacher or oracle.

<span class="mw-page-title-main">Extreme learning machine</span> Type of artificial neural network

Extreme learning machines are feedforward neural networks for classification, regression, clustering, sparse approximation, compression and feature learning with a single layer or multiple layers of hidden nodes, where the parameters of hidden nodes need to be tuned. These hidden nodes can be randomly assigned and never updated, or can be inherited from their ancestors without being changed. In most cases, the output weights of hidden nodes are usually learned in a single step, which essentially amounts to learning a linear model.

<span class="mw-page-title-main">Outline of machine learning</span> Overview of and topical guide to machine learning

The following outline is provided as an overview of and topical guide to machine learning. Machine learning is a subfield of soft computing within computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. In 1959, Arthur Samuel defined machine learning as a "field of study that gives computers the ability to learn without being explicitly programmed". Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. Such algorithms operate by building a model from an example training set of input observations in order to make data-driven predictions or decisions expressed as outputs, rather than following strictly static program instructions.

Tin Kam Ho is a computer scientist at IBM Research with contributions to machine learning, data mining, and classification. Ho is noted for introducing random decision forests in 1995, and for her pioneering work in ensemble learning and data complexity analysis. She is an IEEE fellow and IAPR fellow.

Land cover maps are tools that provide vital information about the Earth's land use and cover patterns. They aid policy development, urban planning, and forest and agricultural monitoring.

References

  1. Ho, Tin Kam (1998). "The Random Subspace Method for Constructing Decision Forests" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 20 (8): 832–844. doi:10.1109/34.709601. S2CID   206420153. Archived from the original (PDF) on 2019-05-14.
  2. Bryll, R. (2003). "Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets". Pattern Recognition. 36 (6): 1291–1302. doi:10.1016/s0031-3203(02)00121-8.
  3. Kuncheva, Ludmila; et al. (2010). "Random Subspace Ensembles for fMRI Classification" (PDF). IEEE Transactions on Medical Imaging. 29 (2): 531–542. CiteSeerX   10.1.1.157.1178 . doi:10.1109/TMI.2009.2037756. PMID   20129853.
  4. Bertoni, Alberto; Folgieri, Raffaella; Valentini, Giorgio (2005). "Bio-molecular cancer prediction with random subspace ensembles of support vector machines" (PDF). Neurocomputing. 63: 535–539. doi:10.1016/j.neucom.2004.07.007. hdl: 2434/9370 .
  5. Ho, Tin Kam (1995). Random Decision Forest (PDF). Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, 14–16 August 1995. pp. 278–282.
  6. Skurichina, Marina (2002). "Bagging, boosting and the random subspace method for linear classifiers". Pattern Analysis and Applications. 5 (2): 121–135. doi:10.1007/s100440200011.
  7. Tao, D. (2006). "Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 28 (7): 1088–99. doi:10.1109/tpami.2006.134. PMID   16792098.
  8. Ho, Tin Kam (1998). "Nearest neighbors in random subspaces". Advances in Pattern Recognition. pp. 640–648. doi:10.1007/BFb0033288. ISBN   978-3-540-64858-1.{{cite book}}: |journal= ignored (help)
  9. Tremblay, G. (2004). Optimizing Nearest Neighbour in Random Subspaces using a Multi-Objective Genetic Algorithm (PDF). 17th International Conference on Pattern Recognition. pp. 208–211. doi:10.1109/ICPR.2004.1334060. ISBN   978-0-7695-2128-2.
  10. Nanni, L. (2006). "Experimental comparison of one-class classifiers for online signature verification". Neurocomputing. 69 (7): 869–873. doi:10.1016/j.neucom.2005.06.007.
  11. Cheplygina, Veronika; Tax, David M. J. (2011-06-15). "Pruned Random Subspace Method for One-Class Classifiers". In Sansone, Carlo; Kittler, Josef; Roli, Fabio (eds.). Multiple Classifier Systems. Lecture Notes in Computer Science. Vol. 6713. Springer Berlin Heidelberg. pp. 96–105. doi:10.1007/978-3-642-21557-5_12. ISBN   9783642215568.
  12. Varadi, David (2013). "Random Subspace Optimization (RSO)". CSS Analytics.
  13. Gillen, Ben (2016). "Subset Optimization for Asset Allocation". CaltechAUTHORS.
  14. Shen, Weiwei; Wang, Jun (2017), "Portfolio Selection via Subset Resampling", Proceedings of AAAI Conference on Artificial Intelligence (AAAI2017)
  15. Shen, Weiwei; Wang, Bin; Pu, Jian; Wang, Jun (2019), "The Kelly growth optimal portfolio with ensemble learning", Proceedings of AAAI Conference on Artificial Intelligence (AAAI2019)
  16. 1 2 Tian, Ye; Feng, Yang (2021). "RaSE: Random Subspace Ensemble Classification". Journal of Machine Learning Research. 22 (45): 1–93. ISSN   1533-7928.
  17. Tian, Ye; Feng, Yang (2021). "R Package "RaSEn": Random Subspace Ensemble Classification and Variable Screening". CRAN.