LeNet

Last updated
LeNet-5 architecture (overview). LeNet-5 architecture.svg
LeNet-5 architecture (overview).

LeNet is a series of convolutional neural network structure proposed by LeCun et al.. [1] The earliest version, LeNet-1, was trained in 1989. In general, when "LeNet" is referred to without a number, it refers to LeNet-5 (1998), the most well-known version.

Contents

Convolutional neural networks are a kind of feed-forward neural network whose artificial neurons can respond to a part of the surrounding cells in the coverage range and perform well in large-scale image processing. LeNet-5 was one of the earliest convolutional neural networks and was historically important during the development of deep learning. [2]

Development history

Yann LeCun in 2018 Yann LeCun - 2018 (cropped).jpg
Yann LeCun in 2018

In 1988, LeCun joined the Adaptive Systems Research Department at AT&T Bell Laboratories in Holmdel, New Jersey, United States, headed by Lawrence D. Jackel.

In 1988, LeCun et al. published a neural network design that recognize handwritten zip code. However, its convolutional kernels were hand-designed. [3]

In 1989, Yann LeCun et al. at Bell Labs first applied the backpropagation algorithm to practical applications, and believed that the ability to learn network generalization could be greatly enhanced by providing constraints from the task's domain. He combined a convolutional neural network trained by backpropagation algorithms to read handwritten numbers and successfully applied it in identifying handwritten zip code numbers provided by the US Postal Service. This was the prototype of what later came to be called LeNet-1. [4] In the same year, LeCun described a small handwritten digit recognition problem in another paper, and showed that even though the problem is linearly separable, single-layer networks exhibited poor generalization capabilities. When using shift-invariant feature detectors on a multi-layered, constrained network, the model could perform very well. He believed that these results proved that minimizing the number of free parameters in the neural network could enhance the generalization ability of the neural network. [5]

In 1990, their paper described the application of backpropagation networks in handwritten digit recognition again. They only performed minimal preprocessing on the data, and the model was carefully designed for this task and it was highly constrained. The input data consisted of images, each containing a number, and the test results on the postal code digital data provided by the US Postal Service showed that the model had an error rate of only 1% and a rejection rate of about 9%. [6]

Their research continued for the next four years, and in 1994 MNIST database was developed, for which LeNet-1 was too small, hence a new LeNet-4 was trained on it. [7]

A year later the AT&T Bell Labs collective introduced LeNet-5 and reviewed various methods on handwritten character recognition in paper, using standard handwritten digits to identify benchmark tasks. These models were compared and the results showed that the latest network outperformed other models. [8]

By 1998 Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner were able to provided examples of practical applications of neural networks, such as two systems for recognizing handwritten characters online and models that could read millions of checks per day. [1]

The research achieved great success and aroused the interest of scholars in the study of neural networks. While the architecture of the best performing neural networks today are not the same as that of LeNet, the network was the starting point for a large number of neural network architectures, and also brought inspiration to the field.

Timeline
1989Yann LeCun et al. proposed the original form of LeNet (LeNet-1) [4]
1989Yann LeCun demonstrates that minimizing the number of free parameters in neural networks can enhance the generalization ability of neural networks. [5]
1990Application of backpropagation to LeNet-1 in handwritten digit recognition. [6]
1994MNIST database and LeNet-4 developed [7]
1995LeNet-5 developed, various methods applied to handwritten character recognition reviewed and compared with standard handwritten digit recognition benchmarks. The results show that convolutional neural networks outperform all other models. [8]
1998Practical applications [1]

Architecture

Comparison of the LeNet and AlexNet convolution, pooling, and dense layers
(AlexNet image size should be 227x227x3, instead of 224x224x3, so the math will come out right. The original paper said different numbers, but Andrej Karpathy, the former head of computer vision at Tesla, said it should be 227x227x3 (he said Alex didn't describe why he put 224x224x3). The next convolution should be 11x11 with stride 4: 55x55x96 (instead of 54x54x96). It would be calculated, for example, as: [(input width 227 - kernel width 11) / stride 4] + 1 = [(227 - 11) / 4] + 1 = 55. Since the kernel output is the same length as width, its area is 55x55.) Comparison image neural networks.svg
Comparison of the LeNet and AlexNet convolution, pooling, and dense layers
(AlexNet image size should be 227×227×3, instead of 224×224×3, so the math will come out right. The original paper said different numbers, but Andrej Karpathy, the former head of computer vision at Tesla, said it should be 227×227×3 (he said Alex didn't describe why he put 224×224×3). The next convolution should be 11×11 with stride 4: 55×55×96 (instead of 54×54×96). It would be calculated, for example, as: [(input width 227 - kernel width 11) / stride 4] + 1 = [(227 - 11) / 4] + 1 = 55. Since the kernel output is the same length as width, its area is 55×55.)

LeNet has several common motifs of modern convolutional neural networks, such as convolutional layer, pooling layer and full connection layer. [4]

LeNet-1

Before LeNet-1, the 1988 architecture [3] was a hybrid approach. The first stage scaled, deskewed, and skeletonized the input image. The second stage was a convolutional layer with 18 hand-designed kernels. The third stage was a fully connected network with one hidden layer.

The LeNet-1 architecture has 3 hidden layers (H1-H3) and an output layer. [4] It has 1256 units, 64660 connections, and 9760 independent parameters.

The dataset was 9298 grayscale images, digitized from handwritten zip codes that appeared on U.S. mail passing through the Buffalo, New York post office. [9] The training set had 7291 data points, and test set had 2007. Both training and test set contained ambiguous, unclassifiable, and misclassified data. Training took 3 days on a Sun workstation.

Compared to the previous 1988 architecture, there was no skeletonization, and the convolutional kernels were learned automatically by backpropagation.

A later version of LeNet-1 has four hidden layers (H1-H4) and an output layer. It takes a 28x28 pixel image as input, though the active region is 16x16 to avoid boundary effects. [10]

The network 4635 units, 98442 connections, and 2578 trainable parameters. It was started by a previous CNN [11] with 4 times as many trainable parameters, then optimized by Optimal Brain Damage. [12] One forward pass requires about 140,000 multiply-add operations. [7]

LeNet-4

LeNet-4 was a larger version of LeNet-1 designed to fit the larger MNIST database. It had more feature maps in its convolutional layers, and had an additional layer of hidden units, fully connected to both the last convolutional layer and to the output units. It has 2 convolutions, 2 average poolings, and 2 fully connected layers. It has about 17000 trainable parameters. [7]

One forward pass requires about 260,000 multiply-add operations. [7]

LeNet-5

LeNet-5 architecture block diagram LeNet-5 architecture block diagram.svg
LeNet-5 architecture block diagram
LeNet-5 architecture (detailed). LeNet architecture.png
LeNet-5 architecture (detailed).

LeNet-5 is similar to LeNet-4, but with more fully connected layers. Its architecture is shown in the image on the right. It has 2 convolutions, 2 average poolings, and 3 fully connected layers.

LeNet-5 was trained for about 20 epoches over MNIST. It took 2 to 3 days of CPU time on a Silicon Graphics Origin 2000 server, using a single 200 MHz R10000 processor. [1]

Application

Recognizing simple digit images is the most classic application of LeNet as it was created because of that.

Yann LeCun et al. created LeNet-1 in 1989. The paper Backpropagation Applied to Handwritten Zip Code Recognition [4] demonstrates how such constraints can be integrated into a backpropagation network through the architecture of the network. And it had been successfully applied to the recognition of handwritten zip code digits provided by the U.S. Postal Service. [4]

After the development of LeNet-1, as a demonstration for real-time application, they loaded the neural network into a AT&T DSP-32C digital signal processor [13] with a peak performance of 12.5 million multiply-add operations per second. It could normalize-and-classify 10 digits a second, or classify 30 normalized digits a second. Shortly afterwards, the research group started working with a development group and a product group at NCR (acquired by AT&T in 1991). It resulted in ATM machines that could read the numerical amounts on checks using a LeNet loaded on DSP-32C. Later, NCR deployed a similar system in large cheque reading machines in bank back offices. [14]

Development analysis

The LeNet-5 means the emergence of CNN and defines the basic components of CNN. [1] But it was not popular at that time because of the lack of hardware, especially since GPUs and other algorithms, such as SVM, could achieve similar effects or even exceed LeNet.

Since the success of AlexNet in 2012, CNN has become the best choice for computer vision applications and many different types of CNN have been created, such as the R-CNN series. Nowadays, CNN models are quite different from LeNet, but they are all developed on the basis of LeNet.

A three-layer tree architecture imitating LeNet-5 and consisting of only one convolutional layer, has achieved a similar success rate on the CIFAR-10 dataset. [15]

Increasing the number of filters for the LeNet architecture results in a power law decay of the error rate. These results indicate that a shallow network can achieve the same performance as deep learning architectures. [16]

Related Research Articles

<span class="mw-page-title-main">Neural network (machine learning)</span> Computational model used in machine learning, based on connected, hierarchical functions

In machine learning, a neural network is a model inspired by the structure and function of biological neural networks in animal brains.

<span class="mw-page-title-main">Jürgen Schmidhuber</span> German computer scientist

Jürgen Schmidhuber is a German computer scientist noted for his work in the field of artificial intelligence, specifically artificial neural networks. He is a scientific director of the Dalle Molle Institute for Artificial Intelligence Research in Switzerland. He is also director of the Artificial Intelligence Initiative and professor of the Computer Science program in the Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE) division at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia.

In machine learning, backpropagation is a gradient estimation method commonly used for training a neural network to compute its parameter updates.

The neocognitron is a hierarchical, multilayered artificial neural network proposed by Kunihiko Fukushima in 1979. It has been used for Japanese handwritten character recognition and other pattern recognition tasks, and served as the inspiration for convolutional neural networks.

<span class="mw-page-title-main">Yann LeCun</span> French computer scientist (born 1960)

Yann André LeCun is a French-American computer scientist working primarily in the fields of machine learning, computer vision, mobile robotics and computational neuroscience. He is the Silver Professor of the Courant Institute of Mathematical Sciences at New York University and Vice President, Chief AI Scientist at Meta.

<span class="mw-page-title-main">Time delay neural network</span>

Time delay neural network (TDNN) is a multilayer artificial neural network architecture whose purpose is to 1) classify patterns with shift-invariance, and 2) model context at each layer of the network.

There are many types of artificial neural networks (ANN).

<span class="mw-page-title-main">Deep learning</span> Branch of machine learning

Deep learning is a subset of machine learning that focuses on utilizing neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience and is centered around stacking artificial neurons into layers and "training" them to process data. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

<span class="mw-page-title-main">MNIST database</span> Database of handwritten digits

The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.

A convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns features by itself via filter optimization. This type of deep learning network has been applied to process and make predictions from many different types of data including text, images and audio. Convolution-based networks are the de-facto standard in deep learning-based approaches to computer vision and image processing, and have only recently have been replaced -- in some cases -- by newer deep learning architectures such as the transformer. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.

<span class="mw-page-title-main">AlexNet</span> An influential convolutional neural network published in 2012

AlexNet is a convolutional neural network (CNN) architecture, designed by Alex Krizhevsky in collaboration with Ilya Sutskever and Geoffrey Hinton, who was Krizhevsky's Ph.D. advisor at the University of Toronto in 2012. It had 60 million parameters and 650,000 neurons.

A capsule neural network (CapsNet) is a machine learning system that is a type of artificial neural network (ANN) that can be used to better model hierarchical relationships. The approach is an attempt to more closely mimic biological neural organization.

In artificial neural networks, a convolutional layer is a type of network layer that applies a convolution operation to the input. Convolutional layers are some of the primary building blocks of convolutional neural networks (CNNs), a class of neural network most commonly applied to images, video, audio, and other data that have the property of uniform translational symmetry.

Inception is a family of convolutional neural network (CNN) for computer vision, introduced by researchers at Google in 2014 as GoogLeNet. The series was historically important as an early CNN that separates the stem, body, and head (prediction), an architectural design that persists in all modern CNN.

Artificial neural networks (ANNs) are models created using machine learning to perform a number of tasks. Their creation was inspired by biological neural circuitry. While some of the computational implementations ANNs relate to earlier discoveries in mathematics, the first implementation of ANNs was by psychologist Frank Rosenblatt, who developed the perceptron. Little research was conducted on ANNs in the 1970s and 1980s, with the AAAI calling this period an "AI winter".

Isabelle Guyon is a French-born researcher in machine learning known for her work on support-vector machines, artificial neural networks and bioinformatics. She is a Chair Professor at the University of Paris-Saclay. Guyon serves as the Director, Research Scientist at Google Research since October 2022.

<span class="mw-page-title-main">Large width limits of neural networks</span> Feature of artificial neural networks

Artificial neural networks are a class of models used in machine learning, and inspired by biological neural networks. They are the core component of modern deep learning algorithms. Computation in artificial neural networks is usually organized into sequential layers of artificial neurons. The number of neurons in a layer is called the layer width. Theoretical analysis of artificial neural networks sometimes considers the limiting case that layer width becomes large or infinite. This limit enables simple analytic statements to be made about neural network predictions, training dynamics, generalization, and loss surfaces. This wide layer limit is also of practical interest, since finite width neural networks often perform strictly better as layer width is increased.

In machine learning, the term tensor informally refers to two different concepts for organizing and representing data. Data may be organized in a multidimensional array (M-way array), informally referred to as a "data tensor"; however, in the strict mathematical sense, a tensor is a multilinear mapping over a set of domain vector spaces to a range vector space. Observations, such as images, movies, volumes, sounds, and relationships among words and concepts, stored in an M-way array ("data tensor"), may be analyzed either by artificial neural networks or tensor methods.

<span class="mw-page-title-main">VGGNet</span> Series of convolutional neural networks for image classification

The VGGNets are a series of convolutional neural networks (CNNs) developed by the Visual Geometry Group (VGG) at the University of Oxford.

In deep learning, weight initialization describes the initial step in creating a neural network. A neural network contains trainable parameters that are modified during training: weight initialization is the pre-training step of assigning initial values to these parameters.

References

  1. 1 2 3 4 5 Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. (1998). "Gradient-based learning applied to document recognition" (PDF). Proceedings of the IEEE. 86 (11): 2278–2324. doi:10.1109/5.726791. S2CID   14542261.
  2. Zhang, Aston; Lipton, Zachary; Li, Mu; Smola, Alexander J. (2024). "7.6. Convolutional Neural Networks (LeNet)". Dive into deep learning. Cambridge New York Port Melbourne New Delhi Singapore: Cambridge University Press. ISBN   978-1-009-38943-3.
  3. 1 2 Denker, John; Gardner, W.; Graf, Hans; Henderson, Donnie; Howard, R.; Hubbard, W.; Jackel, L. D.; Baird, Henry; Guyon, Isabelle (1988). "Neural Network Recognizer for Hand-Written Zip Code Digits". Advances in Neural Information Processing Systems. 1. Morgan-Kaufmann.
  4. 1 2 3 4 5 6 LeCun, Y.; Boser, B.; Denker, J. S.; Henderson, D.; Howard, R. E.; Hubbard, W.; Jackel, L. D. (December 1989). "Backpropagation Applied to Handwritten Zip Code Recognition". Neural Computation. 1 (4): 541–551. doi:10.1162/neco.1989.1.4.541. ISSN   0899-7667. S2CID   41312633.
  5. 1 2 Lecun, Yann (June 1989). "Generalization and network design strategies" (PDF). Technical Report CRG-TR-89-4. Department of Computer Science, University of Toronto.
  6. 1 2 LeCun, Y.; Boser, B.; Denker, J. S.; Henderson, D.; Howard, R. E.; Hubbard, W.; Jacker, L. D. (June 1990). "Handwritten digit recognition with a back-propagation network" (PDF). Advances in Neural Information Processing Systems. 2: 396–404.
  7. 1 2 3 4 5 Bottou, L.; Cortes, C.; Denker, J.S.; Drucker, H.; Guyon, I.; Jackel, L.D.; LeCun, Y.; Muller, U.A.; Sackinger, E.; Simard, P.; Vapnik, V. (1994). "Comparison of classifier methods: A case study in handwritten digit recognition". Proceedings of the 12th IAPR International Conference on Pattern Recognition (Cat. No.94CH3440-5). Vol. 2. IEEE Comput. Soc. Press. pp. 77–82. doi:10.1109/ICPR.1994.576879. ISBN   978-0-8186-6270-6.
  8. 1 2 LeCun, Yann; Jackel, L.; Bottou, L.; Cortes, Corinna; Denker, J.; Drucker, H.; Guyon, Isabelle M.; Muller, Urs; Sackinger, E.; Simard, Patrice Y.; Vapnik, V. (1995). "Learning algorithms for classification: A comparison on handwritten digit recognition". S2CID   13411815.{{cite journal}}: Cite journal requires |journal= (help)
  9. Wang, Ching-Huei; Srihari, Sargur N. (1988). "A framework for object recognition in a visually complex environment and its application to locating address blocks on mail pieces". International Journal of Computer Vision. 2 (2): 125–151. doi:10.1007/BF00133697. ISSN   0920-5691.
  10. Le Cun, Y.; Matan, O.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jacket, L.D.; Baird, H.S. (1990). "Handwritten zip code recognition with multilayer networks". [1990] Proceedings. 10th International Conference on Pattern Recognition. Vol. ii. IEEE Comput. Soc. Press. pp. 35–40. doi:10.1109/ICPR.1990.119325. ISBN   978-0-8186-2062-1.
  11. Le Cun, Y.; Jackel, L. D.; Boser, B.; Denker, J. S.; Graf, H. P.; Guyon, I.; Henderson, D.; Howard, R. E.; Hubbard, W. (1990). "Handwritten Digit Recognition: Applications of Neural Net Chips and Automatic Learning". In Soulié, Françoise Fogelman; Hérault, Jeanny (eds.). Neurocomputing. Berlin, Heidelberg: Springer. pp. 303–318. doi:10.1007/978-3-642-76153-9_35. ISBN   978-3-642-76153-9.
  12. LeCun, Yann; Denker, John; Solla, Sara (1989). "Optimal Brain Damage". Advances in Neural Information Processing Systems. 2. Morgan-Kaufmann.
  13. Fuccio, M.L.; Gadenz, R.N.; Garen, C.J.; Huser, J.M.; Ng, B.; Pekarich, S.P.; Ulery, K.D. (December 1988). "The DSP32C: AT&Ts second generation floating point digital signal processor". IEEE Micro. 8 (6): 30–48. doi:10.1109/40.16779. ISSN   0272-1732.
  14. Yann LeCun (2014-06-02). Convolutional Network Demo from 1989 . Retrieved 2024-10-31 via YouTube.
  15. Meir, Yuval; Ben-Noam, Itamar; Tzach, Yarden; Hodassman, Shiri; Kanter, Ido (2023-01-30). "Learning on tree architectures outperforms a convolutional feedforward network". Scientific Reports. 13 (1): 962. Bibcode:2023NatSR..13..962M. doi:10.1038/s41598-023-27986-6. ISSN   2045-2322. PMC   9886946 . PMID   36717568.
  16. Meir, Yuval; Tevet, Ofek; Tzach, Yarden; Hodassman, Shiri; Gross, Ronit D.; Kanter, Ido (2023-04-20). "Efficient shallow learning as an alternative to deep learning". Scientific Reports. 13 (1): 5423. arXiv: 2211.11106 . Bibcode:2023NatSR..13.5423M. doi:10.1038/s41598-023-32559-8. ISSN   2045-2322. PMC   10119101 . PMID   37080998.