ADALINE

Last updated November 15, 2024 • 3 min readFrom Wikipedia, The Free Encyclopedia

ADALINE (Adaptive Linear Neuron or later Adaptive Linear Element) is an early single-layer artificial neural network and the name of the physical device that implemented it.^[2]^[3]^[1]^[4]^[5] It was developed by professor Bernard Widrow and his doctoral student Marcian Hoff at Stanford University in 1960. It is based on the perceptron and consists of weights, a bias, and a summation function. The weights and biases were implemented by rheostats (as seen in the "knobby ADALINE"), and later, memistors.

The difference between Adaline and the standard (Rosenblatt) perceptron is in how they learn. Adaline unit weights are adjusted to match a teacher signal, before applying the Heaviside function (see figure), but the standard perceptron unit weights are adjusted to match the correct output, after applying the Heaviside function.

A multilayer network of ADALINE units is known as a MADALINE.

Definition

Adaline is a single-layer neural network with multiple nodes, where each node accepts multiple inputs and generates one output. Given the following variables:

$x$ , the input vector
$w$ , the weight vector
$n$ , the number of inputs
$\theta$ , some constant
$y$ , the output of the model,

the output is:

y=\sum _{j=1}^{n}x_{j}w_{j}+\theta

If we further assume that $x_{0}=1$ and $w_{0}=\theta$ , then the output further reduces to:

y=\sum _{j=0}^{n}x_{j}w_{j}

Learning rule

The learning rule used by ADALINE is the LMS ("least mean squares") algorithm, a special case of gradient descent.

Given the following:

$\eta$ , the learning rate
$y$ , the model output
$o$ , the target (desired) output
$E=(o-y)^{2}$ , the square of the error,

the LMS algorithm updates the weights as follows:

w\leftarrow w+\eta (o-y)x

This update rule minimizes $E$ , the square of the error,^[6] and is in fact the stochastic gradient descent update for linear regression.^[7]

MADALINE

MADALINE (Many ADALINE^[8]) is a three-layer (input, hidden, output), fully connected, feedforward neural network architecture for classification that uses ADALINE units in its hidden and output layers. I.e., its activation function is the sign function.^[9] The three-layer network uses memistors. As the sign function is non-differentiable, backpropagation cannot be used to train MADALINE networks. Hence, three different training algorithms have been suggested, called Rule I, Rule II and Rule III.

Despite many attempts, they never succeeded in training more than a single layer of weights in a MADALINE model. This was until Widrow saw the backpropagation algorithm in a 1985 conference in Snowbird, Utah.^[10]

MADALINE Rule 1 (MRI) - The first of these dates back to 1962.^[11] It consists of two layers: the first is made of ADALINE units (let the output of the $i$ th ADALINE unit be $o_{i}$ ); the second layer has two units. One is a majority-voting unit that takes in all $o_{i}$ , and if there are more positives than negatives, outputs +1, and vice versa. Another is a "job assigner": suppose the desired output is -1, and different from the majority-voted output, then the job assigner calculates the minimal number of ADALINE units that must change their outputs from positive to negative, and picks those ADALINE units that are closest to being negative, and makes them update their weights according to the ADALINE learning rule. It was thought of as a form of "minimal disturbance principle".^[12]

The largest MADALINE machine built had 1000 weights, each implemented by a memistor. It was built in 1963 and used MRI for learning.^[12]^[13]

Some MADALINE machines were demonstrated to perform tasks including inverted pendulum balancing, weather forecasting, and speech recognition.^[3]

MADALINE Rule 2 (MRII) - The second training algorithm, described in 1988, improved on Rule I.^[8] The Rule II training algorithm is based on a principle called "minimal disturbance". It proceeds by looping over training examples, and for each example, it:

finds the hidden layer unit (ADALINE classifier) with the lowest confidence in its prediction,
tentatively flips the sign of the unit,
accepts or rejects the change based on whether the network's error is reduced,
stops when the error is zero.

MADALINE Rule 3 - The third "Rule" applied to a modified network with sigmoid activations instead of sign; it was later found to be equivalent to backpropagation.^[12]

Additionally, when flipping single units' signs does not drive the error to zero for a particular example, the training algorithm starts flipping pairs of units' signs, then triples of units, etc.^[8]

Related Research Articles

In machine learning, a neural network is a model inspired by the structure and function of biological neural networks in animal brains.

An adaptive filter is a system with a linear filter that has a transfer function controlled by variable parameters and a means to adjust those parameters according to an optimization algorithm. Because of the complexity of the optimization algorithms, almost all adaptive filters are digital filters. Adaptive filters are required for some applications because some parameters of the desired processing operation are not known in advance or are changing. The closed loop adaptive filter uses feedback in the form of an error signal to refine its transfer function.

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

An artificial neuron is a mathematical function conceived as a model of biological neurons in a neural network. Artificial neurons are the elementary units of artificial neural networks. The artificial neuron is a function that receives one or more inputs, applies weights to these inputs, and sums them to produce an output.

In machine learning, the delta rule is a gradient descent learning rule for updating the weights of the inputs to artificial neurons in a single-layer neural network. It can be derived as the backpropagation algorithm for a single-layer neural network with mean-square error loss function.

In machine learning, backpropagation is a gradient estimation method commonly used for training neural networks to compute the network parameter updates.

Recurrent neural networks (RNNs) are a class of artificial neural network commonly used for sequential data processing. Unlike feedforward neural networks, which process data in a single pass, RNNs process data across multiple time steps, making them well-adapted for modelling and processing text, speech, and time series.

A feedforward neural network (FNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. Its flow is uni-directional, meaning that the information in the model flows in only one direction—forward—from the input nodes, through the hidden nodes and to the output nodes, without any cycles or loops. Modern feedforward networks are trained using backpropagation, and are colloquially referred to as "vanilla" neural networks.

In deep learning, a multilayer perceptron (MLP) is a name for a modern feedforward neural network consisting of fully connected neurons with nonlinear activation functions, organized in layers, notable for being able to distinguish data that is not linearly separable.

Quantum neural networks are computational neural network models which are based on the principles of quantum mechanics. The first ideas on quantum neural computation were published independently in 1995 by Subhash Kak and Ron Chrisley, engaging with the theory of quantum mind, which posits that quantum effects play a role in cognitive function. However, typical research in quantum neural networks involves combining classical artificial neural network models with the advantages of quantum information in order to develop more efficient algorithms. One important motivation for these investigations is the difficulty to train classical neural networks, especially in big data applications. The hope is that features of quantum computing such as quantum parallelism or the effects of interference and entanglement can be used as resources. Since the technological implementation of a quantum computer is still in a premature stage, such quantum neural network models are mostly theoretical proposals that await their full implementation in physical experiments.

Bernard Widrow is a U.S. professor of electrical engineering at Stanford University. He is the co-inventor of the Widrow–Hoff least mean squares filter (LMS) adaptive algorithm with his then doctoral student Ted Hoff. The LMS algorithm led to the ADALINE and MADALINE artificial neural networks and to the backpropagation technique. He made other fundamental contributions to the development of signal processing in the fields of geophysics, adaptive antennas, and adaptive filtering. A summary of his work is.

The cerebellar model arithmetic computer (CMAC) is a type of neural network based on a model of the mammalian cerebellum. It is also known as the cerebellar model articulation controller. It is a type of associative memory.

Neural cryptography is a branch of cryptography dedicated to analyzing the application of stochastic algorithms, especially artificial neural network algorithms, for use in encryption and cryptanalysis.

Time delay neural network (TDNN) is a multilayer artificial neural network architecture whose purpose is to 1) classify patterns with shift-invariance, and 2) model context at each layer of the network.

Backpropagation through time (BPTT) is a gradient-based technique for training certain types of recurrent neural networks, such as Elman networks. The algorithm was independently derived by numerous researchers.

There are many types of artificial neural networks (ANN).

A probabilistic neural network (PNN) is a feedforward neural network, which is widely used in classification and pattern recognition problems. In the PNN algorithm, the parent probability distribution function (PDF) of each class is approximated by a Parzen window and a non-parametric function. Then, using PDF of each class, the class probability of a new input data is estimated and Bayes’ rule is then employed to allocate the class with highest posterior probability to new input data. By this method, the probability of mis-classification is minimized. This type of artificial neural network (ANN) was derived from the Bayesian network and a statistical algorithm called Kernel Fisher discriminant analysis. It was introduced by D.F. Specht in 1966. In a PNN, the operations are organized into a multilayered feedforward network with four layers:

An artificial neural network's learning rule or learning process is a method, mathematical logic or algorithm which improves the network's performance and/or training time. Usually, this rule is applied repeatedly over the network. It is done by updating the weight and bias levels of a network when it is simulated in a specific data environment. A learning rule may accept existing conditions of the network, and will compare the expected result and actual result of the network to give new and improved values for the weights and biases. Depending on the complexity of the model being simulated, the learning rule of the network can be as simple as an XOR gate or mean squared error, or as complex as the result of a system of differential equations.

<span class="mw-page-title-main">Residual neural network</span> Type of artificial neural network

A residual neural network is a deep learning architecture in which the layers learn residual functions with reference to the layer inputs. It was developed in 2015 for image recognition, and won the ImageNet Large Scale Visual Recognition Challenge of that year.

An artificial neural network (ANN) combines biological principles with advanced statistics to solve problems in domains such as pattern recognition and game-play. ANNs adopt the basic model of neuron analogues connected to each other in a variety of ways.

References

1 2 1960: An adaptive "ADALINE" neuron using chemical "memistors"
↑ Anderson, James A.; Rosenfeld, Edward (2000). Talking Nets: An Oral History of Neural Networks. MIT Press. ISBN 9780262511117.
1 2 Youtube: widrowlms: Science in Action
↑ Youtube: widrowlms: The LMS algorithm and ADALINE. Part I - The LMS algorithm
↑ Youtube: widrowlms: The LMS algorithm and ADALINE. Part II - ADALINE and memistor ADALINE
↑ "Adaline (Adaptive Linear)" (PDF). CS 4793: Introduction to Artificial Neural Networks. Department of Computer Science, University of Texas at San Antonio.
↑ Avi Pfeffer. "CS181 Lecture 5 — Perceptrons" (PDF). Harvard University.^{[ permanent dead link ]}
1 2 3 Rodney Winter; Bernard Widrow (1988). MADALINE RULE II: A training algorithm for neural networks (PDF). IEEE International Conference on Neural Networks. pp. 401–408. doi:10.1109/ICNN.1988.23872.
↑ Youtube: widrowlms: Science in Action (Madaline is mentioned at the start and at 8:46)
↑ Anderson, James A.; Rosenfeld, Edward, eds. (2000). Talking Nets: An Oral History of Neural Networks. The MIT Press. doi:10.7551/mitpress/6626.003.0004. ISBN 978-0-262-26715-1.
↑ Widrow, Bernard (1962). "Generalization and information storage in networks of adaline neurons" (PDF). Self-organizing Systems: 435–461.
1 2 3 Widrow, Bernard; Lehr, Michael A. (1990). "30 years of adaptive neural networks: perceptron, madaline, and backpropagation". Proceedings of the IEEE. 78 (9): 1415–1442. doi:10.1109/5.58323. S2CID 195704643.
↑ B. Widrow, “Adaline and Madaline-1963, plenary speech,” Proc. 1st lEEE lntl. Conf. on Neural Networks, Vol. 1, pp. 145-158, San Diego, CA, June 23, 1987

External links

widrowlms (2012-07-29). The LMS algorithm and ADALINE. Part II - ADALINE and memistor ADALINE . Retrieved 2024-08-17– via YouTube. Widrow demonstrating both a working knobby ADALINE machine and a memistor ADALINE machine.
"Delta Learning Rule: ADALINE". Artificial Neural Networks. Universidad Politécnica de Madrid. Archived from the original on 2002-06-15.
"Memristor-Based Multilayer Neural Networks With Online Gradient Descent Training". Implementation of the ADALINE algorithm with memristors in analog computing.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[widrow1960-1] 1 2 1960: An adaptive "ADALINE" neuron using chemical "memistors"

[2] Anderson, James A.; Rosenfeld, Edward (2000). Talking Nets: An Oral History of Neural Networks. MIT Press. ISBN 9780262511117.

[:0-3] 1 2 Youtube: widrowlms: Science in Action

[4] Youtube: widrowlms: The LMS algorithm and ADALINE. Part I - The LMS algorithm

[5] Youtube: widrowlms: The LMS algorithm and ADALINE. Part II - ADALINE and memistor ADALINE

[6] "Adaline (Adaptive Linear)" (PDF). CS 4793: Introduction to Artificial Neural Networks. Department of Computer Science, University of Texas at San Antonio.

[7] Avi Pfeffer. "CS181 Lecture 5 — Perceptrons" (PDF). Harvard University.^{[ permanent dead link ]}

[winter-8] 1 2 3 Rodney Winter; Bernard Widrow (1988). MADALINE RULE II: A training algorithm for neural networks (PDF). IEEE International Conference on Neural Networks. pp. 401–408. doi:10.1109/ICNN.1988.23872.

[9] Youtube: widrowlms: Science in Action (Madaline is mentioned at the start and at 8:46)

[10] Anderson, James A.; Rosenfeld, Edward, eds. (2000). Talking Nets: An Oral History of Neural Networks. The MIT Press. doi:10.7551/mitpress/6626.003.0004. ISBN 978-0-262-26715-1.

[mrone-11] Widrow, Bernard (1962). "Generalization and information storage in networks of adaline neurons" (PDF). Self-organizing Systems: 435–461.

[thirty-12] 1 2 3 Widrow, Bernard; Lehr, Michael A. (1990). "30 years of adaptive neural networks: perceptron, madaline, and backpropagation". Proceedings of the IEEE. 78 (9): 1415–1442. doi:10.1109/5.58323. S2CID 195704643.

[13] B. Widrow, “Adaline and Madaline-1963, plenary speech,” Proc. 1st lEEE lntl. Conf. on Neural Networks, Vol. 1, pp. 145-158, San Diego, CA, June 23, 1987

[2]

[3]

[1]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]