Concept class

Last updated October 11, 2023

In computational learning theory in mathematics, a concept over a domain X is a total Boolean function over X. A concept class is a class of concepts. Concept classes are a subject of computational learning theory.

Concept class terminology frequently appears in model theory associated with probably approximately correct (PAC) learning.^[1] In this setting, if one takes a set Y as a set of (classifier output) labels, and X is a set of examples, the map $c:X\to Y$ , i.e. from examples to classifier labels (where $Y=\{0,1\}$ and where c is a subset of X), c is then said to be a concept. A concept class $C$ is then a collection of such concepts.

Given a class of concepts C, a subclass D is reachable if there exists a sample s such that D contains exactly those concepts in C that are extensions to s.^[2] Not every subclass is reachable.^[2]^{[ why? ]}

Background

A sample $s$ is a partial function from $X$ ^{[ clarification needed ]} to $\{0,1\}$ .^[2] Identifying a concept with its characteristic function mapping $X$ to $\{0,1\}$ , it is a special case of a sample.^[2]

Two samples are consistent if they agree on the intersection of their domains.^[2] A sample $s'$ extends another sample $s$ if the two are consistent and the domain of $s$ is contained in the domain of $s'$ .^[2]

Examples

Suppose that $C=S^{+}(X)$ . Then:

the subclass $\{\{x\}\}$ is reachable with the sample $s=\{(x,1)\}$ ;^[2]^{[ why? ]}
the subclass $S^{+}(Y)$ for $Y\subseteq X$ are reachable with a sample that maps the elements of $X-Y$ to zero;^[2]^{[ why? ]}
the subclass $S(X)$ , which consists of the singleton sets, is not reachable.^[2]^{[ why? ]}

Applications

Let $C$ be some concept class. For any concept $c\in C$ , we call this concept $1/d$ -good for a positive integer $d$ if, for all $x\in X$ , at least $1/d$ of the concepts in $C$ agree with $c$ on the classification of $x$ .^[2] The fingerprint dimension $FD(C)$ of the entire concept class $C$ is the least positive integer $d$ such that every reachable subclass $C'\subseteq C$ contains a concept that is $1/d$ -good for it.^[2] This quantity can be used to bound the minimum number of equivalence queries^{[ clarification needed ]} needed to learn a class of concepts according to the following inequality: ${\textstyle FD(C)-1\leq \#EQ(C)\leq \lceil FD(C)\ln(|C|)\rceil }$ .^[2]

Related Research Articles

In mathematics, a binary relation associates elements of one set, called the domain, with elements of another set, called the codomain. A binary relation over sets $X$ and $Y$ is a new set of ordered pairs $(x, y)$ consisting of elements $x$ from $X$ and $y$ from $Y$ . It is a generalization of the more widely understood idea of a unary function. It encodes the common concept of relation: an element $x$ is related to an element $y$ , if and only if the pair $(x, y)$ belongs to the set of ordered pairs that defines the binary relation. A binary relation is the most studied special case $n = 2$ of an $n$ -ary relation over sets $X 1, ..., X n$ , which is a subset of the Cartesian product

In mathematics, a continuous function is a function such that a continuous variation of the argument induces a continuous variation of the value of the function. This means there are no abrupt changes in value, known as discontinuities. More precisely, a function is continuous if arbitrarily small changes in its value can be assured by restricting to sufficiently small changes of its argument. A discontinuous function is a function that is not continuous. Until the 19th century, mathematicians largely relied on intuitive notions of continuity and considered only continuous functions. The epsilon–delta definition of a limit was introduced to formalize the definition of continuity.

In the mathematical discipline of set theory, forcing is a technique for proving consistency and independence results. Intuitively, forcing can be thought of as a technique to expand the set theoretical universe $to a larger universe by introducing a new "generic" object .$

<span class="mw-page-title-main">Gumbel distribution</span> Particular case of the generalized extreme value distribution

In probability theory and statistics, the Gumbel distribution is used to model the distribution of the maximum of a number of samples of various distributions.

Vapnik–Chervonenkis theory was developed during 1960–1990 by Vladimir Vapnik and Alexey Chervonenkis. The theory is a form of computational learning theory, which attempts to explain the learning process from a statistical point of view.

In Vapnik–Chervonenkis theory, the Vapnik–Chervonenkis (VC) dimension is a measure of the capacity of a set of functions that can be learned by a statistical binary classification algorithm. It is defined as the cardinality of the largest set of points that the algorithm can shatter, which means the algorithm can always learn a perfect classifier for any labeling of at least one configuration of those data points. It was originally defined by Vladimir Vapnik and Alexey Chervonenkis.

In mathematics, pointwise convergence is one of various senses in which a sequence of functions can converge to a particular function. It is weaker than uniform convergence, to which it is often compared.

In computational learning theory, probably approximately correct (PAC) learning is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant.

In mathematics, the Sierpiński space is a finite topological space with two points, only one of which is closed. It is the smallest example of a topological space which is neither trivial nor discrete. It is named after Wacław Sierpiński.

In mathematics, a low-discrepancy sequence is a sequence with the property that for all values of N, its subsequence x₁, ..., x_N has a low discrepancy.

In model theory, a transfer principle states that all statements of some language that are true for some structure are true for another structure. One of the first examples was the Lefschetz principle, which states that any sentence in the first-order language of fields that is true for the complex numbers is also true for any algebraically closed field of characteristic 0.

In computational learning theory, sample exclusion dimensions arise in the study of exact concept learning with queries.

In mathematics, in the field of functional analysis, a Minkowski functional or gauge function is a function that recovers a notion of distance on a linear space.

In mathematical logic, a definable set is an n-ary relation on the domain of a structure whose elements satisfy some formula in the first-order language of that structure. A set can be defined with or without parameters, which are elements of the domain that can be referenced in the formula defining the relation.

In computational learning theory, Rademacher complexity, named after Hans Rademacher, measures richness of a class of real-valued functions with respect to a probability distribution.

The exponential mechanism is a technique for designing differentially private algorithms. It was developed by Frank McSherry and Kunal Talwar in 2007. Their work was recognized as a co-winner of the 2009 PET Award for Outstanding Research in Privacy Enhancing Technologies.

In mathematics, a submodular set function is a set function that, informally, describes the relationship between a set of inputs and an output, where adding more of one input has a decreasing additional benefit. The natural diminishing returns property which makes them suitable for many applications, including approximation algorithms, game theory and electrical networks. Recently, submodular functions have also found immense utility in several real world problems in machine learning and artificial intelligence, including automatic summarization, multi-document summarization, feature selection, active learning, sensor placement, image collection summarization and many other domains.

The sample complexity of a machine learning algorithm represents the number of training-samples that it needs in order to successfully learn a target function.

The distributional learning theory or learning of probability distribution is a framework in computational learning theory. It has been proposed from Michael Kearns, Yishay Mansour, Dana Ron, Ronitt Rubinfeld, Robert Schapire and Linda Sellie in 1994 and it was inspired from the PAC-framework introduced by Leslie Valiant.

The growth function, also called the shatter coefficient or the shattering number, measures the richness of a set family. It is especially used in the context of statistical learning theory, where it measures the complexity of a hypothesis class. The term 'growth function' was coined by Vapnik and Chervonenkis in their 1968 paper, where they also proved many of its properties. It is a basic concept in machine learning.

References

↑ Chase, H., & Freitag, J. (2018). Model Theory and Machine Learning. arXiv preprint arXiv:1801.06566.
1 2 3 4 5 6 7 8 9 10 11 12 Angluin, D. (2004). "Queries revisited" (PDF). Theoretical Computer Science. 313 (2): 188–191. doi:10.1016/j.tcs.2003.11.004.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Chase, H., & Freitag, J. (2018). Model Theory and Machine Learning. arXiv preprint arXiv:1801.06566.

[:0-2] 1 2 3 4 5 6 7 8 9 10 11 12 Angluin, D. (2004). "Queries revisited" (PDF). Theoretical Computer Science. 313 (2): 188–191. doi:10.1016/j.tcs.2003.11.004.

[1]

[2]