Probabilistic neural network

Last updated

A probabilistic neural network (PNN) [1] is a feedforward neural network, which is widely used in classification and pattern recognition problems. In the PNN algorithm, the parent probability distribution function (PDF) of each class is approximated by a Parzen window and a non-parametric function. Then, using PDF of each class, the class probability of a new input data is estimated and Bayes’ rule is then employed to allocate the class with highest posterior probability to new input data. By this method, the probability of mis-classification is minimized. [2] This type of artificial neural network (ANN) was derived from the Bayesian network [3] and a statistical algorithm called Kernel Fisher discriminant analysis. [4] It was introduced by D.F. Specht in 1966. [5] [6] In a PNN, the operations are organized into a multilayered feedforward network with four layers:

Contents

Layers

PNN is often used in classification problems. [7] When an input is present, the first layer computes the distance from the input vector to the training input vectors. This produces a vector where its elements indicate how close the input is to the training input. The second layer sums the contribution for each class of inputs and produces its net output as a vector of probabilities. Finally, a compete transfer function on the output of the second layer picks the maximum of these probabilities, and produces a 1 (positive identification) for that class and a 0 (negative identification) for non-targeted classes.

Input layer

Each neuron in the input layer represents a predictor variable. In categorical variables, N-1 neurons are used when there are N number of categories. It standardizes the range of the values by subtracting the median and dividing by the interquartile range. Then the input neurons feed the values to each of the neurons in the hidden layer.

Pattern layer

This layer contains one neuron for each case in the training data set. It stores the values of the predictor variables for the case along with the target value. A hidden neuron computes the Euclidean distance of the test case from the neuron's center point and then applies the radial basis function kernel using the sigma values.

Summation layer

For PNN there is one pattern neuron for each category of the target variable. The actual target category of each training case is stored with each hidden neuron; the weighted value coming out of a hidden neuron is fed only to the pattern neuron that corresponds to the hidden neuron’s category. The pattern neurons add the values for the class they represent.

Output layer

The output layer compares the weighted votes for each target category accumulated in the pattern layer and uses the largest vote to predict the target category.

Advantages

There are several advantages and disadvantages using PNN instead of multilayer perceptron. [8]

Disadvantages

Applications based on PNN

Related Research Articles

<span class="mw-page-title-main">Neural network (machine learning)</span> Computational model used in machine learning, based on connected, hierarchical functions

In machine learning, a neural network is a model inspired by the structure and function of biological neural networks in animal brains.

Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent patterns. PR has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power.

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

Unsupervised learning is a framework in machine learning where, in contrast to supervised learning, algorithms learn patterns exclusively from unlabeled data. Other frameworks in the spectrum of supervisions include weak- or semi-supervision, where a small portion of the data is tagged, and self-supervision. Some researchers consider self-supervised learning a form of unsupervised learning.

<span class="mw-page-title-main">Artificial neuron</span> Mathematical function conceived as a crude model

An artificial neuron is a mathematical function conceived as a model of biological neurons in a neural network. Artificial neurons are the elementary units of artificial neural networks. The artificial neuron is a function that receives one or more inputs, applies weights to these inputs, and sums them to produce an output.

When classification is performed by a computer, statistical methods are normally used to develop the algorithm.

Recurrent neural networks (RNNs) are a class of artificial neural networks for sequential data processing. Unlike feedforward neural networks, which process data in a single pass, RNNs process data across multiple time steps, making them well-adapted for modelling and processing text, speech, and time series.

<span class="mw-page-title-main">Feedforward neural network</span> One of two broad types of artificial neural network

A feedforward neural network (FNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. Its flow is uni-directional, meaning that the information in the model flows in only one direction—forward—from the input nodes, through the hidden nodes and to the output nodes, without any cycles or loops, in contrast to recurrent neural networks, which have a bi-directional flow. Modern feedforward networks are trained using the backpropagation method and are colloquially referred to as the "vanilla" neural networks.

A multilayer perceptron (MLP) is a name for a modern feedforward artificial neural network, consisting of fully connected neurons with a nonlinear activation function, organized in at least three layers, notable for being able to distinguish data that is not linearly separable.

<span class="mw-page-title-main">Quantum neural network</span> Quantum Mechanics in Neural Networks

Quantum neural networks are computational neural network models which are based on the principles of quantum mechanics. The first ideas on quantum neural computation were published independently in 1995 by Subhash Kak and Ron Chrisley, engaging with the theory of quantum mind, which posits that quantum effects play a role in cognitive function. However, typical research in quantum neural networks involves combining classical artificial neural network models with the advantages of quantum information in order to develop more efficient algorithms. One important motivation for these investigations is the difficulty to train classical neural networks, especially in big data applications. The hope is that features of quantum computing such as quantum parallelism or the effects of interference and entanglement can be used as resources. Since the technological implementation of a quantum computer is still in a premature stage, such quantum neural network models are mostly theoretical proposals that await their full implementation in physical experiments.

<span class="mw-page-title-main">ADALINE</span> Early single-layer artificial neural network

ADALINE is an early single-layer artificial neural network and the name of the physical device that implemented this network. It was developed by professor Bernard Widrow and his doctoral student Ted Hoff at Stanford University in 1960. It is based on the perceptron. It consists of a weight, a bias and a summation function. The weights and biases were implemented by rheostats, and later, memistors.

Hierarchical temporal memory (HTM) is a biologically constrained machine intelligence technology developed by Numenta. Originally described in the 2004 book On Intelligence by Jeff Hawkins with Sandra Blakeslee, HTM is primarily used today for anomaly detection in streaming data. The technology is based on neuroscience and the physiology and interaction of pyramidal neurons in the neocortex of the mammalian brain.

Neural cryptography is a branch of cryptography dedicated to analyzing the application of stochastic algorithms, especially artificial neural network algorithms, for use in encryption and cryptanalysis.

<span class="mw-page-title-main">Activation function</span> Artificial neural network node function

The activation function of a node in an artificial neural network is a function that calculates the output of the node based on its individual inputs and their weights. Nontrivial problems can be solved using only a few nodes if the activation function is nonlinear. Modern activation functions include the smooth version of the ReLU, the GELU, which was used in the 2018 BERT model, the logistic (sigmoid) function used in the 2012 speech recognition model developed by Hinton et al, the ReLU used in the 2012 AlexNet computer vision model and in the 2015 ResNet model.

<span class="mw-page-title-main">Time delay neural network</span>

Time delay neural network (TDNN) is a multilayer artificial neural network architecture whose purpose is to 1) classify patterns with shift-invariance, and 2) model context at each layer of the network.

There are many types of artificial neural networks (ANN).

<span class="mw-page-title-main">Deep learning</span> Branch of machine learning

Deep learning is a subset of machine learning methods based on neural networks with representation learning. The field takes inspiration from biological neuroscience and is centered around stacking artificial neurons into layers and "training" them to process data. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

An artificial neural network's learning rule or learning process is a method, mathematical logic or algorithm which improves the network's performance and/or training time. Usually, this rule is applied repeatedly over the network. It is done by updating the weights and bias levels of a network when a network is simulated in a specific data environment. A learning rule may accept existing conditions of the network and will compare the expected result and actual result of the network to give new and improved values for weights and bias. Depending on the complexity of actual model being simulated, the learning rule of the network can be as simple as an XOR gate or mean squared error, or as complex as the result of a system of differential equations.

A convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns features by itself via filter optimization. This type of deep learning network has been applied to process and make predictions from many different types of data including text, images and audio. Convolution-based networks are the de-facto standard in deep learning-based approaches to computer vision and image processing, and have only recently have been replaced -- in some cases -- by newer deep learning architectures such as the transformer. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.

In machine learning, Platt scaling or Platt calibration is a way of transforming the outputs of a classification model into a probability distribution over classes. The method was invented by John Platt in the context of support vector machines, replacing an earlier method by Vapnik, but can be applied to other classification models. Platt scaling works by fitting a logistic regression model to a classifier's scores.

References

  1. Mohebali, Behshad; Tahmassebi, Amirhessam; Meyer-Baese, Anke; Gandomi, Amir H. (2020). Probabilistic neural networks: a brief overview of theory, implementation, and application. Elsevier. pp. 347–367. doi:10.1016/B978-0-12-816514-0.00014-X. S2CID   208119250.
  2. Zeinali, Yasha; Story, Brett A. (2017). "Competitive probabilistic neural network". Integrated Computer-Aided Engineering. 24 (2): 105–118. doi:10.3233/ICA-170540.
  3. "Probabilistic Neural Networks". Archived from the original on 2010-12-18. Retrieved 2012-03-22.
  4. "Archived copy" (PDF). Archived from the original (PDF) on 2012-01-31. Retrieved 2012-03-22.{{cite web}}: CS1 maint: archived copy as title (link)
  5. Specht, D. F. (1967-06-01). "Generation of Polynomial Discriminant Functions for Pattern Recognition". IEEE Transactions on Electronic Computers. EC-16 (3): 308–319. doi:10.1109/PGEC.1967.264667. ISSN   0367-7508.
  6. Specht, D. F. (1990). "Probabilistic neural networks". Neural Networks. 3: 109–118. doi:10.1016/0893-6080(90)90049-Q.
  7. "Probabilistic Neural Networks :: Radial Basis Networks (Neural Network Toolbox™)". www.mathworks.in. Archived from the original on 4 August 2012. Retrieved 6 June 2022.
  8. "Probabilistic and General Regression Neural Networks". Archived from the original on 2012-03-02. Retrieved 2012-03-22.
  9. Tran, D. H.; Ng, A. W. M.; Perera, B. J. C.; Burn, S.; Davis, P. (September 2006). "Application of probabilistic neural networks in modelling structural deterioration of stormwater pipes" (PDF). Urban Water Journal. 3 (3): 175–184. doi:10.1080/15730620600961684. S2CID   15220500. Archived from the original (PDF) on 8 August 2017. Retrieved 27 February 2023.
  10. Li, Q. B.; Li, X.; Zhang, G. J.; Xu, Y. Z.; Wu, J. G.; Sun, X. J. (2009). "[Application of probabilistic neural networks method to gastric endoscope samples diagnosis based on FTIR spectroscopy]". Guang Pu Xue Yu Guang Pu Fen Xi. 29 (6): 1553–7. PMID   19810529.
  11. Berno, E.; Brambilla, L.; Canaparo, R.; Casale, F.; Costa, M.; Della Pepa, C.; Eandi, M.; Pasero, E. (2003). "Application of probabilistic neural networks to population pharmacokineties". Proceedings of the International Joint Conference on Neural Networks, 2003. pp. 2637–2642. doi:10.1109/IJCNN.2003.1223983. ISBN   0-7803-7898-9. S2CID   60477107.
  12. Huang, Chenn-Jung; Liao, Wei-Chen (2004). "Application of Probabilistic Neural Networks to the Class Prediction of Leukemia and Embryonal Tumor of Central Nervous System". Neural Processing Letters. 19 (3): 211–226. doi:10.1023/B:NEPL.0000035613.51734.48. S2CID   5651402.
  13. Araghi, Leila Fallah; d Khaloozade, Hami; Arvan, Mohammad Reza (19 March 2009). "Ship Identification Using Probabilistic Neural Networks (PNN)" (PDF). Proceedings of the International MultiConference of Engineers and Computer Scientists. 2. Hong Kong, China. Retrieved 27 February 2023.
  14. "Archived copy" (PDF). Archived from the original (PDF) on 2010-06-14. Retrieved 2012-03-22.{{cite web}}: CS1 maint: archived copy as title (link)
  15. Zhang, Y. (2009). "Remote-sensing Image Classification Based on an Improved Probabilistic Neural Network". Sensors. 9 (9): 7516–7539. Bibcode:2009Senso...9.7516Z. doi: 10.3390/s90907516 . PMC   3290485 . PMID   22400006.