Siamese neural network

Last updated

A Siamese neural network (sometimes called a twin neural network) is an artificial neural network that uses the same weights while working in tandem on two different input vectors to compute comparable output vectors. [1] [2] [3] [4] Often one of the output vectors is precomputed, thus forming a baseline against which the other output vector is compared. This is similar to comparing fingerprints but can be described more technically as a distance function for locality-sensitive hashing.[ citation needed ]

Contents

It is possible to build an architecture that is functionally similar to a twin network but implements a slightly different function. This is typically used for comparing similar instances in different type sets.[ citation needed ]

Uses of similarity measures where a twin network might be used are such things as recognizing handwritten checks, automatic detection of faces in camera images, and matching queries with indexed documents. The perhaps most well-known application of twin networks are face recognition, where known images of people are precomputed and compared to an image from a turnstile or similar. It is not obvious at first, but there are two slightly different problems. One is recognizing a person among a large number of other persons, that is the facial recognition problem. DeepFace is an example of such a system. [4] In its most extreme form this is recognizing a single person at a train station or airport. The other is face verification, that is to verify whether the photo in a pass is the same as the person claiming he or she is the same person. The twin network might be the same, but the implementation can be quite different.

Learning

Learning in twin networks can be done with triplet loss or contrastive loss. For learning by triplet loss a baseline vector (anchor image) is compared against a positive vector (truthy image) and a negative vector (falsy image). The negative vector will force learning in the network, while the positive vector will act like a regularizer. For learning by contrastive loss there must be a weight decay to regularize the weights, or some similar operation like a normalization.

A distance metric for a loss function may have the following properties [5]

In particular, the triplet loss algorithm is often defined with squared Euclidean (which unlike Euclidean, does not have triangle inequality) distance at its core.

Predefined metrics, Euclidean distance metric

The common learning goal is to minimize a distance metric for similar objects and maximize for distinct ones. This gives a loss function like

are indexes into a set of vectors
function implemented by the twin network

The most common distance metric used is Euclidean distance, in case of which the loss function can be rewritten in matrix form as

Learned metrics, nonlinear distance metric

A more general case is where the output vector from the twin network is passed through additional network layers implementing non-linear distance metrics.

are indexes into a set of vectors
function implemented by the twin network
function implemented by the network joining outputs from the twin network

On a matrix form the previous is often approximated as a Mahalanobis distance for a linear space as [6]

This can be further subdivided in at least Unsupervised learning and Supervised learning.

Learned metrics, half-twin networks

This form also allows the twin network to be more of a half-twin, implementing a slightly different functions

are indexes into a set of vectors
function implemented by the half-twin network
function implemented by the network joining outputs from the twin network

Twin networks for object tracking

Twin networks have been used in object tracking because of its unique two tandem inputs and similarity measurement. In object tracking, one input of the twin network is user pre-selected exemplar image, the other input is a larger search image, which twin network's job is to locate exemplar inside of search image. By measuring the similarity between exemplar and each part of the search image, a map of similarity score can be given by the twin network. Furthermore, using a Fully Convolutional Network, the process of computing each sector's similarity score can be replaced with only one cross correlation layer. [7]

After being first introduced in 2016, Twin fully convolutional network has been used in many High-performance Real-time Object Tracking Neural Networks. Like CFnet, [8] StructSiam, [9] SiamFC-tri, [10] DSiam, [11] SA-Siam, [12] SiamRPN, [13] DaSiamRPN, [14] Cascaded SiamRPN, [15] SiamMask, [16] SiamRPN++, [17] Deeper and Wider SiamRPN. [18]

See also

Further reading

Related Research Articles

<span class="mw-page-title-main">Probability density function</span> Function whose integral over a region describes the probability of an event occurring in that region

In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a function whose value at any given sample in the sample space can be interpreted as providing a relative likelihood that the value of the random variable would be equal to that sample. Probability density is the probability per unit length, in other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0, the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would be close to one sample compared to the other sample.

In machine learning, support vector machines are supervised max-margin models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories by Vladimir Vapnik with colleagues SVMs are one of the most studied models, being based on statistical learning frameworks or VC theory proposed by Vapnik and Chervonenkis (1974).

Del, or nabla, is an operator used in mathematics as a vector differential operator, usually represented by the nabla symbol . When applied to a function defined on a one-dimensional domain, it denotes the standard derivative of the function as defined in calculus. When applied to a field, it may denote any one of three operations depending on the way it is applied: the gradient or (locally) steepest slope of a scalar field ; the divergence of a vector field; or the curl (rotation) of a vector field.

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

In mathematics, the Laplace operator or Laplacian is a differential operator given by the divergence of the gradient of a scalar function on Euclidean space. It is usually denoted by the symbols , (where is the nabla operator), or . In a Cartesian coordinate system, the Laplacian is given by the sum of second partial derivatives of the function with respect to each independent variable. In other coordinate systems, such as cylindrical and spherical coordinates, the Laplacian also has a useful form. Informally, the Laplacian Δf (p) of a function f at a point p measures by how much the average value of f over small spheres or balls centered at p deviates from f (p).

In vector calculus, Green's theorem relates a line integral around a simple closed curve C to a double integral over the plane region D bounded by C. It is the two-dimensional special case of Stokes' theorem.

In the calculus of variations, a field of mathematical analysis, the functional derivative relates a change in a functional to a change in a function on which the functional depends.

In mathematics, the Hodge star operator or Hodge star is a linear map defined on the exterior algebra of a finite-dimensional oriented vector space endowed with a nondegenerate symmetric bilinear form. Applying the operator to an element of the algebra produces the Hodge dual of the element. This map was introduced by W. V. D. Hodge.

In mathematics, the Hessian matrix, Hessian or Hesse matrix is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named after him. Hesse originally used the term "functional determinants". The Hessian is sometimes denoted by H or, ambiguously, by ∇2.

In mathematics and computing, the Levenberg–Marquardt algorithm, also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. These minimization problems arise especially in least squares curve fitting. The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent. The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. For well-behaved functions and reasonable starting parameters, the LMA tends to be slower than the GNA. LMA can also be viewed as Gauss–Newton using a trust region approach.

<span class="mw-page-title-main">Gauss–Newton algorithm</span> Mathematical algorithm

The Gauss–Newton algorithm is used to solve non-linear least squares problems, which is equivalent to minimizing a sum of squared function values. It is an extension of Newton's method for finding a minimum of a non-linear function. Since a sum of squares must be nonnegative, the algorithm can be viewed as using Newton's method to iteratively approximate zeroes of the components of the sum, and thus minimizing the sum. In this sense, the algorithm is also an effective method for solving overdetermined systems of equations. It has the advantage that second derivatives, which can be challenging to compute, are not required.

In mathematics, matrix calculus is a specialized notation for doing multivariable calculus, especially over spaces of matrices. It collects the various partial derivatives of a single function with respect to many variables, and/or of a multivariate function with respect to a single variable, into vectors and matrices that can be treated as single entities. This greatly simplifies operations such as finding the maximum or minimum of a multivariate function and solving systems of differential equations. The notation used here is commonly used in statistics and engineering, while the tensor index notation is preferred in physics.

The softmax function, also known as softargmax or normalized exponential function, converts a vector of K real numbers into a probability distribution of K possible outcomes. It is a generalization of the logistic function to multiple dimensions, and used in multinomial logistic regression. The softmax function is often used as the last activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes.

In statistics, the multivariate t-distribution is a multivariate probability distribution. It is a generalization to random vectors of the Student's t-distribution, which is a distribution applicable to univariate random variables. While the case of a random matrix could be treated within this structure, the matrix t-distribution is distinct and makes particular use of the matrix structure.

Oja's learning rule, or simply Oja's rule, named after Finnish computer scientist Erkki Oja, is a model of how neurons in the brain or in artificial neural networks change connection strength, or learn, over time. It is a modification of the standard Hebb's Rule that, through multiplicative normalization, solves all stability problems and generates an algorithm for principal components analysis. This is a computational form of an effect which is believed to happen in biological neurons.

In the field of mathematical modeling, a radial basis function network is an artificial neural network that uses radial basis functions as activation functions. The output of the network is a linear combination of radial basis functions of the inputs and neuron parameters. Radial basis function networks have many uses, including function approximation, time series prediction, classification, and system control. They were first formulated in a 1988 paper by Broomhead and Lowe, both researchers at the Royal Signals and Radar Establishment.

<span class="mw-page-title-main">Linear-nonlinear-Poisson cascade model</span>

The linear-nonlinear-Poisson (LNP) cascade model is a simplified functional model of neural spike responses. It has been successfully used to describe the response characteristics of neurons in early sensory pathways, especially the visual system. The LNP model is generally implicit when using reverse correlation or the spike-triggered average to characterize neural responses with white-noise stimuli.

<span class="mw-page-title-main">Multivariate stable distribution</span>

The multivariate stable distribution is a multivariate probability distribution that is a multivariate generalisation of the univariate stable distribution. The multivariate stable distribution defines linear relations between stable distribution marginals. In the same way as for the univariate case, the distribution is defined in terms of its characteristic function.

<span class="mw-page-title-main">Hinge loss</span> Loss function in machine learning

In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs).

In statistics, machine learning and algorithms, a tensor sketch is a type of dimensionality reduction that is particularly efficient when applied to vectors that have tensor structure. Such a sketch can be used to speed up explicit kernel methods, bilinear pooling in neural networks and is a cornerstone in many numerical linear algebra algorithms.

References

  1. Chicco, Davide (2020), "Siamese neural networks: an overview", Artificial Neural Networks, Methods in Molecular Biology, vol. 2190 (3rd ed.), New York City, New York, USA: Springer Protocols, Humana Press, pp. 73–94, doi:10.1007/978-1-0716-0826-5_3, ISBN   978-1-0716-0826-5, PMID   32804361, S2CID   221144012
  2. Bromley, Jane; Guyon, Isabelle; LeCun, Yann; Säckinger, Eduard; Shah, Roopak (1994). "Signature verification using a "Siamese" time delay neural network" (PDF). Advances in Neural Information Processing Systems. 6: 737–744.
  3. Chopra, S.; Hadsell, R.; LeCun, Y. (June 2005). "Learning a Similarity Metric Discriminatively, with Application to Face Verification". 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). Vol. 1. pp. 539–546 vol. 1. doi:10.1109/CVPR.2005.202. ISBN   0-7695-2372-2. S2CID   5555257.
  4. 1 2 Taigman, Y.; Yang, M.; Ranzato, M.; Wolf, L. (June 2014). "DeepFace: Closing the Gap to Human-Level Performance in Face Verification". 2014 IEEE Conference on Computer Vision and Pattern Recognition. pp. 1701–1708. doi:10.1109/CVPR.2014.220. ISBN   978-1-4799-5118-5. S2CID   2814088.
  5. Chatterjee, Moitreya; Luo, Yunan. "Similarity Learning with (or without) Convolutional Neural Network" (PDF). Retrieved 2018-12-07.
  6. Chandra, M.P. (1936). "On the generalized distance in statistics" (PDF). Proceedings of the National Institute of Sciences of India. 1. 2: 49–55.
  7. Fully-Convolutional Siamese Networks for Object Tracking arXiv : 1606.09549
  8. "End-to-end representation learning for Correlation Filter based tracking".
  9. "Structured Siamese Network for Real-Time Visual Tracking" (PDF).
  10. "Triplet Loss in Siamese Network for Object Tracking" (PDF).
  11. "Learning Dynamic Siamese Network for Visual Object Tracking" (PDF).
  12. "A Twofold Siamese Network for Real-Time Object Tracking" (PDF).
  13. "High Performance Visual Tracking with Siamese Region Proposal Network" (PDF).
  14. Zhu, Zheng; Wang, Qiang; Li, Bo; Wu, Wei; Yan, Junjie; Hu, Weiming (2018). "Distractor-aware Siamese Networks for Visual Object Tracking". arXiv: 1808.06048 [cs.CV].
  15. Fan, Heng; Ling, Haibin (2018). "Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking". arXiv: 1812.06148 [cs.CV].
  16. Wang, Qiang; Zhang, Li; Bertinetto, Luca; Hu, Weiming; Torr, Philip H. S. (2018). "Fast Online Object Tracking and Segmentation: A Unifying Approach". arXiv: 1812.05050 [cs.CV].
  17. Li, Bo; Wu, Wei; Wang, Qiang; Zhang, Fangyi; Xing, Junliang; Yan, Junjie (2018). "SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks". arXiv: 1812.11703 [cs.CV].
  18. Zhang, Zhipeng; Peng, Houwen (2019). "Deeper and Wider Siamese Networks for Real-Time Visual Tracking". arXiv: 1901.01660 [cs.CV].