In Euclidean geometry, linear separability is a property of two sets of points. This is most easily visualized in two dimensions (the Euclidean plane) by thinking of one set of points as being colored blue and the other set of points as being colored red. These two sets are linearly separable if there exists at least one line in the plane with all of the blue points on one side of the line and all the red points on the other side. This idea immediately generalizes to higher-dimensional Euclidean spaces if the line is replaced by a hyperplane.
The problem of determining if a pair of sets is linearly separable and finding a separating hyperplane if they are, arises in several areas. In statistics and machine learning, classifying certain types of data is a problem for which good algorithms exist that are based on this concept.
Let and be two sets of points in an n-dimensional Euclidean space. Then and are linearly separable if there exist n + 1 real numbers , such that every point satisfies and every point satisfies , where is the -th component of .
Equivalently, two sets are linearly separable precisely when their respective convex hulls are disjoint (colloquially, do not overlap). [1]
In simple 2D, it can also be imagined that the set of points under a linear transformation collapses into a line, on which there exists a value, k, greater than which one set of points will fall into, and lesser than which the other set of points fall.
Three non-collinear points in two classes ('+' and '-') are always linearly separable in two dimensions. This is illustrated by the three examples in the following figure (the all '+' case is not shown, but is similar to the all '-' case):
However, not all sets of four points, no three collinear, are linearly separable in two dimensions. The following example would need two straight lines and thus is not linearly separable:
Notice that three points which are collinear and of the form "+ ⋅⋅⋅ — ⋅⋅⋅ +" are also not linearly separable.
Let be the number of ways to linearly separate N points (in general position) in K dimensions, then [2] When K is large, is very close to one when , but very close to zero when . In words, one perceptron unit can almost certainly memorize a random assignment of binary labels on N points when , but almost certainly not when .
A Boolean function in n variables can be thought of as an assignment of 0 or 1 to each vertex of a Boolean hypercube in n dimensions. This gives a natural division of the vertices into two sets. The Boolean function is said to be linearly separable provided these two sets of points are linearly separable. The number of distinct Boolean functions is where n is the number of variables passed into the function. [3]
Such functions are also called linear threshold logic, or perceptrons. The classical theory is summarized in, [4] as Knuth claims. [5]
The value is only known exactly up to case, but the order of magnitude is known quite exactly: it has upper bound and lower bound . [6]
It is co-NP-complete to decide whether a Boolean function given in disjunctive or conjunctive normal form is linearly separable. [6]
Number of variables | Boolean functions | Linearly separable Boolean functions |
---|---|---|
2 | 16 | 14 |
3 | 256 | 104 |
4 | 65536 | 1882 |
5 | 4294967296 | 94572 |
6 | 18446744073709552000 | 15028134 |
7 | 3.402823669 ×10^38 | 8378070864 |
8 | 1.157920892 ×10^77 | 17561539552946 |
9 | 1.340780792 ×10^154 | 144130531453121108 |
Classifying data is a common task in machine learning. Suppose some data points, each belonging to one of two sets, are given and we wish to create a model that will decide which set a new data point will be in. In the case of support vector machines, a data point is viewed as a p-dimensional vector (a list of p numbers), and we want to know whether we can separate such points with a (p − 1)-dimensional hyperplane. This is called a linear classifier. There are many hyperplanes that might classify (separate) the data. One reasonable choice as the best hyperplane is the one that represents the largest separation, or margin, between the two sets. So we choose the hyperplane so that the distance from it to the nearest data point on each side is maximized. If such a hyperplane exists, it is known as the maximum-margin hyperplane and the linear classifier it defines is known as a maximum margin classifier .
More formally, given some training data , a set of n points of the form
where the yi is either 1 or −1, indicating the set to which the point belongs. Each is a p-dimensional real vector. We want to find the maximum-margin hyperplane that divides the points having from those having . Any hyperplane can be written as the set of points satisfying
where denotes the dot product and the (not necessarily normalized) normal vector to the hyperplane. The parameter determines the offset of the hyperplane from the origin along the normal vector .
If the training data are linearly separable, we can select two hyperplanes in such a way that they separate the data and there are no points between them, and then try to maximize their distance.
In mathematics, physics, and engineering, a Euclidean vector or simply a vector is a geometric object that has magnitude and direction. Euclidean vectors can be added and scaled to form a vector space. A vector quantity is a vector-valued physical quantity, including units of measurement and possibly a support, formulated as a directed line segment. A vector is frequently depicted graphically as an arrow connecting an initial pointA with a terminal pointB, and denoted by
In Euclidean geometry, an affine transformation or affinity is a geometric transformation that preserves lines and parallelism, but not necessarily Euclidean distances and angles.
In machine learning, support vector machines are supervised max-margin models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories, SVMs are one of the most studied models, being based on statistical learning frameworks of VC theory proposed by Vapnik and Chervonenkis (1974).
In machine learning, a linear classifier makes a classification decision for each object based on a linear combination of its features. Such classifiers work well for practical problems such as document classification, and more generally for problems with many variables (features), reaching accuracy levels comparable to non-linear classifiers while taking less time to train and use.
In geometry, a hyperplane is a generalization of a two-dimensional plane in three-dimensional space to mathematical spaces of arbitrary dimension. Like a plane in space, a hyperplane is a flat hypersurface, a subspace whose dimension is one less than that of the ambient space. Two lower-dimensional examples of hyperplanes are one-dimensional lines in a plane and zero-dimensional points on a line.
In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.
In geometry, a normal is an object that is perpendicular to a given object. For example, the normal line to a plane curve at a given point is the line perpendicular to the tangent line to the curve at the point.
In mathematics, the concept of a projective space originated from the visual effect of perspective, where parallel lines seem to meet at infinity. A projective space may thus be viewed as the extension of a Euclidean space, or, more generally, an affine space with points at infinity, in such a way that there is one point at infinity of each direction of parallel lines.
Nonlinear dimensionality reduction, also known as manifold learning, is any of various related techniques that aim to project high-dimensional data, potentially existing across non-linear manifolds which cannot be adequately captured by linear decomposition methods, onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.
In projective geometry, duality or plane duality is a formalization of the striking symmetry of the roles played by points and lines in the definitions and theorems of projective planes. There are two approaches to the subject of duality, one through language and the other a more functional approach through special mappings. These are completely equivalent and either treatment has as its starting point the axiomatic version of the geometries under consideration. In the functional approach there is a map between related geometries that is called a duality. Such a map can be constructed in many ways. The concept of plane duality readily extends to space duality and beyond that to duality in any finite-dimensional projective geometry.
In geometry, a straight line, usually abbreviated line, is an infinitely long object with no width, depth, or curvature, an idealization of such physical objects as a straightedge, a taut string, or a ray of light. Lines are spaces of dimension one, which may be embedded in spaces of dimension two, three, or higher. The word line may also refer, in everyday life, to a line segment, which is a part of a line delimited by two points.
In mathematics, the real coordinate space or real coordinate n-space, of dimension n, denoted Rn or , is the set of all ordered n-tuples of real numbers, that is the set of all sequences of n real numbers, also known as coordinate vectors. Special cases are called the real lineR1, the real coordinate planeR2, and the real coordinate three-dimensional spaceR3. With component-wise addition and scalar multiplication, it is a real vector space.
In geometry, a set of points in space are coplanar if there exists a geometric plane that contains them all. For example, three points are always coplanar, and if the points are distinct and non-collinear, the plane they determine is unique. However, a set of four or more distinct points will, in general, not lie in a single plane.
In geometry, a three-dimensional space is a mathematical space in which three values (coordinates) are required to determine the position of a point. Most commonly, it is the three-dimensional Euclidean space, that is, the Euclidean space of dimension three, which models physical space. More general three-dimensional spaces are called 3-manifolds. The term may also refer colloquially to a subset of space, a three-dimensional region, a solid figure.
In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pattern analysis is to find and study general types of relations in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over all pairs of data points computed using inner products. The feature map in kernel machines is infinite dimensional but only requires a finite dimensional matrix from user-input according to the Representer theorem. Kernel machines are slow to compute for datasets larger than a couple of thousand examples without parallel processing.
In Euclidean space, the distance from a point to a plane is the distance between a given point and its orthogonal projection on the plane, the perpendicular distance to the nearest point on the plane.
The winnow algorithm is a technique from machine learning for learning a linear classifier from labeled examples. It is very similar to the perceptron algorithm. However, the perceptron algorithm uses an additive weight-update scheme, while Winnow uses a multiplicative scheme that allows it to perform much better when many dimensions are irrelevant. It is a simple algorithm that scales well to high-dimensional data. During training, Winnow is shown a sequence of positive and negative examples. From these it learns a decision hyperplane that can then be used to label novel examples as positive or negative. The algorithm can also be used in the online learning setting, where the learning and the classification phase are not clearly separated.
In Euclidean and projective geometry, five points determine a conic, just as two (distinct) points determine a line. There are additional subtleties for conics that do not exist for lines, and thus the statement and its proof for conics are both more technical than for lines.
In machine learning, the kernel perceptron is a variant of the popular perceptron learning algorithm that can learn kernel machines, i.e. non-linear classifiers that employ a kernel function to compute the similarity of unseen samples to training samples. The algorithm was invented in 1964, making it the first kernel classification learner.
In mathematics, orthogonality is the generalization of the geometric notion of perpendicularity to the linear algebra of bilinear forms.
{{cite book}}
: CS1 maint: location missing publisher (link)