Sigmoid function

Last updated

The logistic curve Logistic-curve.svg
The logistic curve
Plot of the error function Error Function.svg
Plot of the error function

A sigmoid function is a function whose graph follows the logistic function. It is defined by the formula:

Contents

In many fields, especially in the context of artificial neural networks, the term "sigmoid function" is correctly recognized as a synonym for the logistic function. While other S-shaped curves, such as the Gompertz curve or the ogee curve, may resemble sigmoid functions, they are distinct mathematical functions with different properties and applications.

Sigmoid functions, particularly the logistic function, have a domain of all real numbers and typically produce output values in the range from 0 to 1, although some variations, like the hyperbolic tangent, produce output values between −1 and 1. These functions are commonly used as activation functions in artificial neurons and as cumulative distribution functions in statistics. The logistic sigmoid is also invertible, with its inverse being the logit function.

Definition

A sigmoid function is a bounded, differentiable, real function that is defined for all real input values and has a non-negative derivative at each point [1] [2] and exactly one inflection point.

Properties

In general, a sigmoid function is monotonic, and has a first derivative which is bell shaped. Conversely, the integral of any continuous, non-negative, bell-shaped function (with one local maximum and no local minimum, unless degenerate) will be sigmoidal. Thus the cumulative distribution functions for many common probability distributions are sigmoidal. One such example is the error function, which is related to the cumulative distribution function of a normal distribution; another is the arctan function, which is related to the cumulative distribution function of a Cauchy distribution.

A sigmoid function is constrained by a pair of horizontal asymptotes as .

A sigmoid function is convex for values less than a particular point, and it is concave for values greater than that point: in many of the examples here, that point is 0.

Examples

Some sigmoid functions compared. In the drawing all functions are normalized in such a way that their slope at the origin is 1. Gjl-t(x).svg
Some sigmoid functions compared. In the drawing all functions are normalized in such a way that their slope at the origin is 1.

using the hyperbolic tangent mentioned above. Here, is a free parameter encoding the slope at , which must be greater than or equal to because any smaller value will result in a function with multiple inflection points, which is therefore not a true sigmoid. This function is unusual because it actually attains the limiting values of -1 and 1 within a finite range, meaning that its value is constant at -1 for all and at 1 for all . Nonetheless, it is smooth (infinitely differentiable, ) everywhere, including at .

Applications

Inverted logistic S-curve to model the relation between wheat yield and soil salinity Gohana inverted S-curve.png
Inverted logistic S-curve to model the relation between wheat yield and soil salinity

Many natural processes, such as those of complex system learning curves, exhibit a progression from small beginnings that accelerates and approaches a climax over time. When a specific mathematical model is lacking, a sigmoid function is often used. [6]

The van Genuchten–Gupta model is based on an inverted S-curve and applied to the response of crop yield to soil salinity.

Examples of the application of the logistic S-curve to the response of crop yield (wheat) to both the soil salinity and depth to water table in the soil are shown in modeling crop response in agriculture.

In artificial neural networks, sometimes non-smooth functions are used instead for efficiency; these are known as hard sigmoids.

In audio signal processing, sigmoid functions are used as waveshaper transfer functions to emulate the sound of analog circuitry clipping. [7]

In biochemistry and pharmacology, the Hill and Hill–Langmuir equations are sigmoid functions.

In computer graphics and real-time rendering, some of the sigmoid functions are used to blend colors or geometry between two values, smoothly and without visible seams or discontinuities.

Titration curves between strong acids and strong bases have a sigmoid shape due to the logarithmic nature of the pH scale.

The logistic function can be calculated efficiently by utilizing type III Unums. [8]

See also

Related Research Articles

<span class="mw-page-title-main">Exponential distribution</span> Probability distribution

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate; the distance parameter could be any meaningful mono-dimensional measure of the process, such as time between production errors, or length along a roll of fabric in the weaving manufacturing process. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

<span class="mw-page-title-main">Hyperbolic functions</span> Collective name of 6 mathematical functions

In mathematics, hyperbolic functions are analogues of the ordinary trigonometric functions, but defined using the hyperbola rather than the circle. Just as the points (cos t, sin t) form a circle with a unit radius, the points (cosh t, sinh t) form the right half of the unit hyperbola. Also, similarly to how the derivatives of sin(t) and cos(t) are cos(t) and –sin(t) respectively, the derivatives of sinh(t) and cosh(t) are cosh(t) and +sinh(t) respectively.

<span class="mw-page-title-main">Logistic function</span> S-shaped curve

A logistic function or logistic curve is a common S-shaped curve with the equation

<span class="mw-page-title-main">Heaviside step function</span> Indicator function of positive numbers

The Heaviside step function, or the unit step function, usually denoted by H or θ, is a step function named after Oliver Heaviside, the value of which is zero for negative arguments and one for positive arguments. Different conventions concerning the value H(0) are in use. It is an example of the general class of step functions, all of which can be represented as linear combinations of translations of this one.

Integration is the basic operation in integral calculus. While differentiation has straightforward rules by which the derivative of a complicated function can be found by differentiating its simpler component functions, integration does not, so tables of known integrals are often useful. This page lists some of the most common antiderivatives.

In Riemannian geometry, the sectional curvature is one of the ways to describe the curvature of Riemannian manifolds. The sectional curvature Kp) depends on a two-dimensional linear subspace σp of the tangent space at a point p of the manifold. It can be defined geometrically as the Gaussian curvature of the surface which has the plane σp as a tangent plane at p, obtained from geodesics which start at p in the directions of σp. The sectional curvature is a real-valued function on the 2-Grassmannian bundle over the manifold.

<span class="mw-page-title-main">Gudermannian function</span> Mathematical function relating circular and hyperbolic functions

In mathematics, the Gudermannian function relates a hyperbolic angle measure to a circular angle measure called the gudermannian of and denoted . The Gudermannian function reveals a close relationship between the circular functions and hyperbolic functions. It was introduced in the 1760s by Johann Heinrich Lambert, and later named for Christoph Gudermann who also described the relationship between circular and hyperbolic functions in 1830. The gudermannian is sometimes called the hyperbolic amplitude as a limiting case of the Jacobi elliptic amplitude when parameter

<span class="mw-page-title-main">Sign function</span> Mathematical function returning -1, 0 or 1

In mathematics, the sign function or signum function is a function that has the value −1, +1 or 0 according to whether the sign of a given real number is positive or negative, or the given number is itself zero. In mathematical notation the sign function is often represented as or .

<span class="mw-page-title-main">Logistic distribution</span> Continuous probability distribution

In probability theory and statistics, the logistic distribution is a continuous probability distribution. Its cumulative distribution function is the logistic function, which appears in logistic regression and feedforward neural networks. It resembles the normal distribution in shape but has heavier tails. The logistic distribution is a special case of the Tukey lambda distribution.

<span class="mw-page-title-main">Laplace distribution</span> Probability distribution

In probability theory and statistics, the Laplace distribution is a continuous probability distribution named after Pierre-Simon Laplace. It is also sometimes called the double exponential distribution, because it can be thought of as two exponential distributions spliced together along the abscissa, although the term is also sometimes used to refer to the Gumbel distribution. The difference between two independent identically distributed exponential random variables is governed by a Laplace distribution, as is a Brownian motion evaluated at an exponentially distributed random time. Increments of Laplace motion or a variance gamma process evaluated over the time scale also have a Laplace distribution.

<span class="mw-page-title-main">Hyperbolic secant distribution</span> Continuous probability distribution

In probability theory and statistics, the hyperbolic secant distribution is a continuous probability distribution whose probability density function and characteristic function are proportional to the hyperbolic secant function. The hyperbolic secant function is equivalent to the reciprocal hyperbolic cosine, and thus this distribution is also called the inverse-cosh distribution.

Expected shortfall (ES) is a risk measure—a concept used in the field of financial risk measurement to evaluate the market risk or credit risk of a portfolio. The "expected shortfall at q% level" is the expected return on the portfolio in the worst of cases. ES is an alternative to value at risk that is more sensitive to the shape of the tail of the loss distribution.

In financial mathematics, tail value at risk (TVaR), also known as tail conditional expectation (TCE) or conditional tail expectation (CTE), is a risk measure associated with the more general value at risk. It quantifies the expected value of the loss given that an event outside a given probability level has occurred.

<span class="mw-page-title-main">Tukey lambda distribution</span> Symmetric probability distribution

Formalized by John Tukey, the Tukey lambda distribution is a continuous, symmetric probability distribution defined in terms of its quantile function. It is typically used to identify an appropriate distribution and not used in statistical models directly.

<span class="mw-page-title-main">Modular lambda function</span> Symmetric holomorphic function

In mathematics, the modular lambda function λ(τ) is a highly symmetric Holomorphic function on the complex upper half-plane. It is invariant under the fractional linear action of the congruence group Γ(2), and generates the function field of the corresponding quotient, i.e., it is a Hauptmodul for the modular curve X(2). Over any point τ, its value can be described as a cross ratio of the branch points of a ramified double cover of the projective line by the elliptic curve , where the map is defined as the quotient by the [−1] involution.

<span class="mw-page-title-main">Wrapped exponential distribution</span> Probability distribution

In probability theory and directional statistics, a wrapped exponential distribution is a wrapped probability distribution that results from the "wrapping" of the exponential distribution around the unit circle.

<span class="mw-page-title-main">Rectifier (neural networks)</span> Type of activation function

In the context of artificial neural networks, the rectifier or ReLU activation function is an activation function defined as the non-negative part of its argument, i.e., the ramp function:

<span class="mw-page-title-main">Loss functions for classification</span> Concept in machine learning

In machine learning and mathematical optimization, loss functions for classification are computationally feasible loss functions representing the price paid for inaccuracy of predictions in classification problems. Given as the space of all possible inputs, and as the set of labels, a typical goal of classification algorithms is to find a function which best predicts a label for a given input . However, because of incomplete information, noise in the measurement, or probabilistic components in the underlying process, it is possible for the same to generate different . As a result, the goal of the learning problem is to minimize expected loss, defined as

<span class="mw-page-title-main">Bell-shaped function</span> Mathematical function having a characteristic "bell"-shaped curve

A bell-shaped function or simply 'bell curve' is a mathematical function having a characteristic "bell"-shaped curve. These functions are typically continuous or smooth, asymptotically approach zero for large negative/positive x, and have a single, unimodal maximum at small x. Hence, the integral of a bell-shaped function is typically a sigmoid function. Bell shaped functions are also commonly symmetric.

<span class="mw-page-title-main">Swish function</span> Mathematical activation function in data analysis

The swish function is a family of mathematical function defined as follows:

References

  1. Han, Jun; Morag, Claudio (1995). "The influence of the sigmoid function parameters on the speed of backpropagation learning". In Mira, José; Sandoval, Francisco (eds.). From Natural to Artificial Neural Computation. Lecture Notes in Computer Science. Vol. 930. pp.  195–201. doi:10.1007/3-540-59497-3_175. ISBN   978-3-540-59497-0.
  2. Ling, Yibei; He, Bin (December 1993). "Entropic analysis of biological growth models". IEEE Transactions on Biomedical Engineering . 40 (12): 1193–2000. doi:10.1109/10.250574. PMID   8125495.
  3. Dunning, Andrew J.; Kensler, Jennifer; Coudeville, Laurent; Bailleux, Fabrice (2015-12-28). "Some extensions in continuous methods for immunological correlates of protection". BMC Medical Research Methodology . 15 (107): 107. doi: 10.1186/s12874-015-0096-9 . PMC   4692073 . PMID   26707389.
  4. "grex --- Growth-curve Explorer". GitHub . 2022-07-09. Archived from the original on 2022-08-25. Retrieved 2022-08-25.
  5. EpsilonDelta (2022-08-16). "Smooth Transition Function in One Dimension | Smooth Transition Function Series Part 1". 13:29/14:04 via www.youtube.com.
  6. Gibbs, Mark N.; Mackay, D. (November 2000). "Variational Gaussian process classifiers". IEEE Transactions on Neural Networks . 11 (6): 1458–1464. doi:10.1109/72.883477. PMID   18249869. S2CID   14456885.
  7. Smith, Julius O. (2010). Physical Audio Signal Processing (2010 ed.). W3K Publishing. ISBN   978-0-9745607-2-4. Archived from the original on 2022-07-14. Retrieved 2020-03-28.
  8. Gustafson, John L.; Yonemoto, Isaac (2017-06-12). "Beating Floating Point at its Own Game: Posit Arithmetic" (PDF). Archived (PDF) from the original on 2022-07-14. Retrieved 2019-12-28.

Further reading