Vowpal Wabbit

Vowpal Wabbit
	A screenshot of Vowpal Wabbit
Developer(s)	Yahoo! Research & later Microsoft Research
Stable release	9.6.0 / November 8, 2022;17 months ago
Repository	github.com/VowpalWabbit/vowpal_wabbit
Written in	C++
Operating system	Linux, macOS, Microsoft Windows
Platform	Cross-platform
Type	Machine learning
License	BSD License
Website	vowpalwabbit.org

Last updated May 08, 2024

Vowpal Wabbit (VW) is an open-source fast online interactive machine learning system library and program developed originally at Yahoo! Research, and currently at Microsoft Research. It was started and is led by John Langford. Vowpal Wabbit's interactive learning support is particularly notable including Contextual Bandits, Active Learning, and forms of guided Reinforcement Learning. Vowpal Wabbit provides an efficient scalable out-of-core implementation with support for a number of machine learning reductions, importance weighting, and a selection of different loss functions and optimization algorithms.

Notable features

The VW program supports:

Multiple supervised (and semi-supervised) learning problems:
- Classification (both binary and multi-class)
- Regression
- Active learning (partially labeled data) for both regression and classification
Multiple learning algorithms (model-types / representations)
- OLS regression
- Matrix factorization (sparse matrix SVD)
- Single layer neural net (with user specified hidden layer node count)
- Searn (Search and Learn)
- Latent Dirichlet Allocation (LDA)
- Stagewise polynomial approximation
- Recommend top-K out of N
- One-against-all (OAA) and cost-sensitive OAA reduction for multi-class
- Weighted all pairs
- Contextual-bandit (with multiple exploration/exploitation strategies)
Multiple loss functions:
- squared error
- quantile
- hinge
- logistic
- poisson
Multiple optimization algorithms
Regularization (L1 norm, L2 norm, & elastic net regularization)
Flexible input - input features may be:
- Binary
- Numerical
- Categorical (via flexible feature-naming and the hash trick)
- Can deal with missing values/sparse-features
Other features
- On the fly generation of feature interactions (quadratic and cubic)
- On the fly generation of N-grams with optional skips (useful for word/language data-sets)
- Automatic test-set holdout and early termination on multiple passes
- bootstrapping
- User settable online learning progress report + auditing of the model
- Hyperparameter optimization

Scalability

Vowpal wabbit has been used to learn a tera-feature (10¹²) data-set on 1000 nodes in one hour.^[1] Its scalability is aided by several factors:

Out-of-core online learning: no need to load all data into memory
The hashing trick: feature identities are converted to a weight index via a hash (uses 32-bit MurmurHash3)
Exploiting multi-core CPUs: parsing of input and learning are done in separate threads.
Compiled C++ code

Related Research Articles

In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class it belongs to. A linear classifier achieves this by making a classification decision based on the value of a linear combination of the characteristics. An object's characteristics are also known as feature values and are typically presented to the machine in a vector called a feature vector. Such classifiers work well for practical problems such as document classification, and more generally for problems with many variables (features), reaching accuracy levels comparable to non-linear classifiers while taking less time to train and use. 5–12–23

Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models, when compared to training the models separately. Early versions of MTL were called "hints".

Feature selection is the process of selecting a subset of relevant features for use in model construction. Stylometry and DNA microarray analysis are two cases where feature selection is used. It should be distinguished from feature extraction.

<span class="mw-page-title-main">Regularization (mathematics)</span> Technique to make a model more generalizable and transferable

In mathematics, statistics, finance, and computer science, particularly in machine learning and inverse problems, regularization is a process that changes the result answer to be "simpler". It is often used to obtain results for ill-posed problems or to prevent overfitting.

An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data. An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an efficient representation (encoding) for a set of data, typically for dimensionality reduction.

In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. Online learning is a common technique used in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms. It is also used in situations where it is necessary for the algorithm to dynamically adapt to new patterns in the data, or when the data itself is generated as a function of time, e.g., stock price prediction. Online learning algorithms may be prone to catastrophic interference, a problem that can be addressed by incremental learning approaches.

In machine learning and statistical classification, multiclass classification or multinomial classification is the problem of classifying instances into one of three or more classes.

There are many types of artificial neural networks (ANN).

In machine learning, feature hashing, also known as the hashing trick, is a fast and space-efficient way of vectorizing features, i.e. turning arbitrary features into indices in a vector or matrix. It works by applying a hash function to the features and using their hash values as indices directly, rather than looking the indices up in an associative array. In addition to its use for encoding non-numeric values, feature hashing can also be used for dimensionality reduction.

Similarity learning is an area of supervised machine learning in artificial intelligence. It is closely related to regression and classification, but the goal is to learn a similarity function that measures how similar or related two objects are. It has applications in ranking, in recommendation systems, visual identity tracking, face verification, and speaker verification.

In machine learning, feature learning or representation learning is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task.

In the field of statistical learning theory, matrix regularization generalizes notions of vector regularization to cases where the object to be learned is a matrix. The purpose of regularization is to enforce conditions, for example sparsity or smoothness, that can produce stable predictive functions. For example, in the more common vector framework, Tikhonov regularization optimizes over

Feature engineering, a preprocessing step in supervised machine learning and statistical modeling, transforms raw data into a more effective set of inputs. Each input comprises several attributes, known as features. By providing models with relevant information, feature engineering significantly enhances their predictive accuracy and decision-making capability.

Extreme learning machines are feedforward neural networks for classification, regression, clustering, sparse approximation, compression and feature learning with a single layer or multiple layers of hidden nodes, where the parameters of hidden nodes need to be tuned. These hidden nodes can be randomly assigned and never updated, or can be inherited from their ancestors without being changed. In most cases, the output weights of hidden nodes are usually learned in a single step, which essentially amounts to learning a linear model.

In machine learning, Manifold regularization is a technique for using the shape of a dataset to constrain the functions that should be learned on that dataset. In many machine learning problems, the data to be learned do not cover the entire input space. For example, a facial recognition system may not need to classify any possible image, but only the subset of images that contain faces. The technique of manifold learning assumes that the relevant subset of data comes from a manifold, a mathematical structure with useful properties. The technique also assumes that the function to be learned is smooth: data with different labels are not likely to be close together, and so the labeling function should not change quickly in areas where there are likely to be many data points. Because of this assumption, a manifold regularization algorithm can use unlabeled data to inform where the learned function is allowed to change quickly and where it is not, using an extension of the technique of Tikhonov regularization. Manifold regularization algorithms can extend supervised learning algorithms in semi-supervised learning and transductive learning settings, where unlabeled data are available. The technique has been used for applications including medical imaging, geographical imaging, and object recognition.

Sparse dictionary learning is a representation learning method which aims at finding a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms and they compose a dictionary. Atoms in the dictionary are not required to be orthogonal, and they may be an over-complete spanning set. This problem setup also allows the dimensionality of the signals being represented to be higher than the one of the signals being observed. The above two properties lead to having seemingly redundant atoms that allow multiple representations of the same signal but also provide an improvement in sparsity and flexibility of the representation.

Structured sparsity regularization is a class of methods, and an area of research in statistical learning theory, that extend and generalize sparsity regularization learning methods. Both sparsity and structured sparsity regularization methods seek to exploit the assumption that the output variable $to be learned can be described by a reduced number of variables in the input space . Sparsity regularization methods focus on selecting the input variables that best describe the output. Structured sparsity regularization methods generalize and extend sparsity regularization methods, by allowing for optimal selection over structures like groups or networks of input variables in .$

The following outline is provided as an overview of and topical guide to machine learning:

This is a comparison of statistical analysis software that allows doing inference with Gaussian processes often using approximations.

References

↑ Agarwal, Alekh; Chapelle, Olivier; Dudik, Miroslav; Langford, John (2011). "A Reliable Effective Terascale Linear Learning System". arXiv: 1110.4198 [cs.LG].

External links

This free and open-source software article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Agarwal, Alekh; Chapelle, Olivier; Dudik, Miroslav; Langford, John (2011). "A Reliable Effective Terascale Linear Learning System". arXiv: 1110.4198 [cs.LG].

[1]


A screenshot of Vowpal Wabbit
Developer(s)	Yahoo! Research & later Microsoft Research

Stable release	9.6.0 / November 8, 2022;17 months ago (2022-11-08)

Repository	github.com/VowpalWabbit/vowpal_wabbit
Written in	C++
Operating system	Linux, macOS, Microsoft Windows
Platform	Cross-platform
Type	Machine learning
License	BSD License
Website	vowpalwabbit.org