MNIST database

Last updated
Sample images from MNIST test dataset MnistExamplesModified.png
Sample images from MNIST test dataset

The MNIST database (Modified National Institute of Standards and Technology database [1] ) is a large database of handwritten digits that is commonly used for training various image processing systems. [2] [3] The database is also widely used for training and testing in the field of machine learning. [4] [5] It was created by "re-mixing" the samples from NIST's original datasets. [6] The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. [7] Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels. [7]

Contents

The MNIST database contains 60,000 training images and 10,000 testing images. [8] Half of the training set and half of the test set were taken from NIST's training dataset, while the other half of the training set and the other half of the test set were taken from NIST's testing dataset. [9] The original creators of the database keep a list of some of the methods tested on it. [7] In their original paper, they use a support-vector machine to get an error rate of 0.8%. [10]

Extended MNIST (EMNIST) is a newer dataset developed and released by NIST to be the (final) successor to MNIST. [11] [12] MNIST included images only of handwritten digits. EMNIST includes all the images from NIST Special Database 19, which is a large database of handwritten uppercase and lower case letters as well as digits. [13] [14] The images in EMNIST were converted into the same 28x28 pixel format, by the same process, as were the MNIST images. Accordingly, tools which work with the older, smaller, MNIST dataset will likely work unmodified with EMNIST.

History

The set of images in the MNIST database was created in 1994 [15] as a combination of two of NIST's databases: Special Database 1 and Special Database 3. Special Database 1 and Special Database 3 consist of digits written by high school students and employees of the United States Census Bureau, respectively. [7]

The original dataset was a set of 128x128 binary images, processed into 28x28 grayscale images. There were originally 60k samples in both the training set and the testing set, but 50k of the testing set were discarded. Refer to [16] for a detailed history and a reconstruction of the discarded testing set.

Performance

Some researchers have achieved "near-human performance" on the MNIST database, using a committee of neural networks; in the same paper, the authors achieve performance double that of humans on other recognition tasks. [17] The highest error rate listed [7] on the original website of the database is 12 percent, which is achieved using a simple linear classifier with no preprocessing. [10]

In 2004, a best-case error rate of 0.42 percent was achieved on the database by researchers using a new classifier called the LIRA, which is a neural classifier with three neuron layers based on Rosenblatt's perceptron principles. [18]

Some researchers have tested artificial intelligence systems using the database put under random distortions. The systems in these cases are usually neural networks and the distortions used tend to be either affine distortions or elastic distortions. [7] Sometimes, these systems can be very successful; one such system achieved an error rate on the database of 0.39 percent. [19]

In 2011, an error rate of 0.27 percent, improving on the previous best result, was reported by researchers using a similar system of neural networks. [20] In 2013, an approach based on regularization of neural networks using DropConnect has been claimed to achieve a 0.21 percent error rate. [21] In 2016, the single convolutional neural network best performance was 0.25 percent error rate. [22] As of August 2018, the best performance of a single convolutional neural network trained on MNIST training data using no data augmentation is 0.25 percent error rate. [22] [23] Also, the Parallel Computing Center (Khmelnytskyi, Ukraine) obtained an ensemble of only 5 convolutional neural networks which performs on MNIST at 0.21 percent error rate. [24] [25] Some images in the testing dataset are barely readable and may prevent reaching test error rates of 0%. [26] In 2018, researchers from Department of System and Information Engineering, University of Virginia announced 0.18% error with simultaneous stacked three kind of neural networks (fully connected, recurrent and convolution neural networks). [27]

Classifiers

This is a table of some of the machine learning methods used on the dataset and their error rates, by type of classifier:

TypeClassifierDistortion Preprocessing Error rate (%)
Linear classifier Pairwise linear classifier NoneDeskewing7.6 [10]
K-Nearest Neighbors K-NN with rigid transformationsNoneNone0.96 [28]
K-Nearest Neighbors K-NN with non-linear deformation (P2DHMDM)NoneShiftable edges0.52 [29]
Boosted Stumps Product of stumps on Haar features NoneHaar features0.87 [30]
Non-linear classifier40 PCA + quadratic classifierNoneNone3.3 [10]
Random Forest Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC) [31] NoneSimple statistical pixel importance2.8 [32]
Support-vector machine (SVM)Virtual SVM, deg-9 poly, 2-pixel jitteredNoneDeskewing0.56 [33]
Neural network 2-layer 784-800-10NoneNone1.6 [34]
Neural network 2-layer 784-800-10Elastic distortionsNone0.7 [34]
Deep neural network (DNN)6-layer 784-2500-2000-1500-1000-500-10Elastic distortionsNone0.35 [35]
Convolutional neural network (CNN)6-layer 784-40-80-500-1000-2000-10NoneExpansion of the training data0.31 [36]
Convolutional neural network 6-layer 784-50-100-500-1000-10-10NoneExpansion of the training data0.27 [37]
Convolutional neural network (CNN)13-layer 64-128(5x)-256(3x)-512-2048-256-256-10NoneNone0.25 [22]
Convolutional neural network Committee of 35 CNNs, 1-20-P-40-P-150-10Elastic distortionsWidth normalizations0.23 [17]
Convolutional neural network Committee of 5 CNNs, 6-layer 784-50-100-500-1000-10-10NoneExpansion of the training data0.21 [24] [25]
Random Multimodel Deep Learning (RMDL)10 NN-10 RNN - 10 CNNNoneNone0.18 [27]
Convolutional neural network Committee of 20 CNNS with Squeeze-and-Excitation Networks [38] None Data augmentation 0.17 [39]
Convolutional neural network Ensemble of 3 CNNs with varying kernel sizesNone Data augmentation consisting of rotation and translation0.09 [40]

See also

Related Research Articles

<span class="mw-page-title-main">Neural network (machine learning)</span> Computational model used in machine learning, based on connected, hierarchical functions

In machine learning, a neural network is a model inspired by the structure and function of biological neural networks in animal brains.

<span class="mw-page-title-main">Jürgen Schmidhuber</span> German computer scientist

Jürgen Schmidhuber is a German computer scientist noted for his work in the field of artificial intelligence, specifically artificial neural networks. He is a scientific director of the Dalle Molle Institute for Artificial Intelligence Research in Switzerland. He is also director of the Artificial Intelligence Initiative and professor of the Computer Science program in the Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE) division at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia.

<span class="mw-page-title-main">Optical neural network</span>

An optical neural network (ONN) is a physical implementation of an artificial neural network with optical components.

<span class="mw-page-title-main">Transfer learning</span> Machine learning technique

Transfer learning (TL) is a technique in machine learning (ML) in which knowledge learned from a task is re-used in order to boost performance on a related task. For example, for image classification, knowledge gained while learning to recognize cars could be applied when trying to recognize trucks. This topic is related to the psychological literature on transfer of learning, although practical ties between the two fields are limited. Reusing/transferring information from previously learned tasks to new tasks has the potential to significantly improve learning efficiency.

There are many types of artificial neural networks (ANN).

<span class="mw-page-title-main">Deep learning</span> Branch of machine learning

Deep learning is the subset of machine learning methods based on artificial neural networks (ANNs) with representation learning. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

Convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns feature engineering by itself via filters optimization. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.

<span class="mw-page-title-main">Quantum machine learning</span> Interdisciplinary research area at the intersection of quantum physics and machine learning

Quantum machine learning is the integration of quantum algorithms within machine learning programs.

This page is a timeline of machine learning. Major discoveries, achievements, milestones and other major events in machine learning are included.

The ImageNet project is a large visual database designed for use in visual object recognition software research. More than 14 million images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided. ImageNet contains more than 20,000 categories, with a typical category, such as "balloon" or "strawberry", consisting of several hundred images. The database of annotations of third-party image URLs is freely available directly from ImageNet, though the actual images are not owned by ImageNet. Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge, where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes.

Data augmentation is a statistical technique which allows maximum likelihood estimation from incomplete data. Data augmentation has important applications in Bayesian analysis, and the technique is widely used in machine learning to reduce overfitting when training machine learning models, achieved by training models on several slightly-modified copies of existing data.

<span class="mw-page-title-main">AlexNet</span> Convolutional neural network

AlexNet is the name of a convolutional neural network (CNN) architecture, designed by Alex Krizhevsky in collaboration with Ilya Sutskever and Geoffrey Hinton, who was Krizhevsky's Ph.D. advisor at the University of Toronto.

<span class="mw-page-title-main">Residual neural network</span> Deep learning method

A residual neural network is a seminal deep learning model in which the weight layers learn residual functions with reference to the layer inputs. It was developed in 2015 for image recognition and won that year's ImageNet Large Scale Visual Recognition Challenge.

The CIFAR-10 dataset is a collection of images that are commonly used to train machine learning and computer vision algorithms. It is one of the most widely used datasets for machine learning research. The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.

Neural architecture search (NAS) is a technique for automating the design of artificial neural networks (ANN), a widely used model in the field of machine learning. NAS has been used to design networks that are on par or outperform hand-designed architectures. Methods for NAS can be categorized according to the search space, search strategy and performance estimation strategy used:

Artificial neural networks (ANNs) are models created using machine learning to perform a number of tasks. Their creation was inspired by neural circuitry. While some of the computational implementations ANNs relate to earlier discoveries in mathematics, the first implementation of ANNs was by psychologist Frank Rosenblatt, who developed the perceptron. Little research was conducted on ANNs in the 1970s and 1980s, with the AAAI calling that period an "AI winter".

LeNet is a convolutional neural network structure proposed by LeCun et al. in 1998. In general, LeNet refers to LeNet-5 and is a simple convolutional neural network. Convolutional neural networks are a kind of feed-forward neural network whose artificial neurons can respond to a part of the surrounding cells in the coverage range and perform well in large-scale image processing.

Isabelle Guyon is a French-born researcher in machine learning known for her work on support-vector machines, artificial neural networks and bioinformatics. She is a Chair Professor at the University of Paris-Saclay.

The Fashion MNIST dataset is a large freely available database of fashion images that is commonly used for training and testing various machine learning systems. Fashion-MNIST was intended to serve as a replacement for the original MNIST database for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits.

References

  1. "THE MNIST DATABASE of handwritten digits". Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York Christopher J.C. Burges, Microsoft Research, Redmond.
  2. "Support vector machines speed pattern recognition - Vision Systems Design". Vision Systems Design. September 2004. Retrieved 17 August 2013.
  3. Gangaputra, Sachin. "Handwritten digit database" . Retrieved 17 August 2013.
  4. Qiao, Yu (2007). "THE MNIST DATABASE of handwritten digits" . Retrieved 18 August 2013.
  5. Platt, John C. (1999). "Using analytic QP and sparseness to speed training of support vector machines" (PDF). Advances in Neural Information Processing Systems: 557–563. Archived from the original (PDF) on 4 March 2016. Retrieved 18 August 2013.
  6. Grother, Patrick J. "NIST Special Database 19 - Handprinted Forms and Characters Database" (PDF). National Institute of Standards and Technology .
  7. 1 2 3 4 5 6 LeCun, Yann; Cortez, Corinna; Burges, Christopher C.J. "The MNIST Handwritten Digit Database". Yann LeCun's Website yann.lecun.com. Retrieved 30 April 2020.
  8. Kussul, Ernst; Baidyk, Tatiana (2004). "Improved method of handwritten digit recognition tested on MNIST database". Image and Vision Computing. 22 (12): 971–981. doi:10.1016/j.imavis.2004.03.008.
  9. Zhang, Bin; Srihari, Sargur N. (2004). "Fast k-Nearest Neighbor Classification Using Cluster-Based Trees" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 26 (4): 525–528. doi:10.1109/TPAMI.2004.1265868. PMID   15382657. S2CID   6883417 . Retrieved 20 April 2020.
  10. 1 2 3 4 LeCun, Yann; Léon Bottou; Yoshua Bengio; Patrick Haffner (1998). "Gradient-Based Learning Applied to Document Recognition" (PDF). Proceedings of the IEEE. 86 (11): 2278–2324. doi:10.1109/5.726791. S2CID   14542261 . Retrieved 18 August 2013.
  11. NIST (4 April 2017). "The EMNIST Dataset". NIST . Retrieved 11 April 2022.
  12. NIST (27 August 2010). "NIST Special Database 19". NIST . Retrieved 11 April 2022.
  13. Cohen, G.; Afshar, S.; Tapson, J.; van Schaik, A. (2017). "EMNIST: an extension of MNIST to handwritten letters". arXiv: 1702.05373 [cs.CV].
  14. Cohen, G.; Afshar, S.; Tapson, J.; van Schaik, A. (2017). "EMNIST: an extension of MNIST to handwritten letters". arXiv: 1702.05373v1 [cs.CV].
  15. L. Bottou et al., "Comparison of classifier methods: a case study in handwritten digit recognition," Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5), Jerusalem, Israel, 1994, pp. 77-82 vol.2, doi: 10.1109/ICPR.1994.576879.
  16. Yadav, Chhavi; Bottou, Leon (2019). "Cold Case: The Lost MNIST Digits". Advances in Neural Information Processing Systems. 32. Curran Associates, Inc. arXiv: 1905.10498 .
  17. 1 2 Cires¸an, Dan; Ueli Meier; Jürgen Schmidhuber (2012). "Multi-column deep neural networks for image classification" (PDF). 2012 IEEE Conference on Computer Vision and Pattern Recognition. pp. 3642–3649. arXiv: 1202.2745 . CiteSeerX   10.1.1.300.3283 . doi:10.1109/CVPR.2012.6248110. ISBN   978-1-4673-1228-8. S2CID   2161592.
  18. Kussul, Ernst; Tatiana Baidyk (2004). "Improved method of handwritten digit recognition tested on MNIST database" (PDF). Image and Vision Computing. 22 (12): 971–981. doi:10.1016/j.imavis.2004.03.008. Archived from the original (PDF) on 21 September 2013. Retrieved 20 September 2013.
  19. Ranzato, Marc'Aurelio; Christopher Poultney; Sumit Chopra; Yann LeCun (2006). "Efficient Learning of Sparse Representations with an Energy-Based Model" (PDF). Advances in Neural Information Processing Systems. 19: 1137–1144. Retrieved 20 September 2013.
  20. Ciresan, Dan Claudiu; Ueli Meier; Luca Maria Gambardella; Jürgen Schmidhuber (2011). "Convolutional neural network committees for handwritten character classification" (PDF). 2011 International Conference on Document Analysis and Recognition (ICDAR). pp. 1135–1139. CiteSeerX   10.1.1.465.2138 . doi:10.1109/ICDAR.2011.229. ISBN   978-1-4577-1350-7. S2CID   10122297. Archived from the original (PDF) on 22 February 2016. Retrieved 20 September 2013.
  21. Wan, Li; Matthew Zeiler; Sixin Zhang; Yann LeCun; Rob Fergus (2013). Regularization of Neural Network using DropConnect. International Conference on Machine Learning(ICML).
  22. 1 2 3 SimpleNet (2016). "Lets Keep it simple, Using simple architectures to outperform deeper and more complex architectures". arXiv: 1608.06037 . Retrieved 3 December 2020.
  23. SimpNet (2018). "Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet". Github. arXiv: 1802.06205 . Retrieved 3 December 2020.
  24. 1 2 Romanuke, Vadim. "Parallel Computing Center (Khmelnytskyi, Ukraine) represents an ensemble of 5 convolutional neural networks which performs on MNIST at 0.21 percent error rate" . Retrieved 24 November 2016.
  25. 1 2 Romanuke, Vadim (2016). "Training data expansion and boosting of convolutional neural networks for reducing the MNIST dataset error rate". Research Bulletin of NTUU "Kyiv Polytechnic Institute". 6 (6): 29–34. doi: 10.20535/1810-0546.2016.6.84115 .
  26. MNIST classifier, GitHub. "Classify MNIST digits using Convolutional Neural Networks". GitHub . Retrieved 3 August 2018.
  27. 1 2 Kowsari, Kamran; Heidarysafa, Mojtaba; Brown, Donald E.; Meimandi, Kiana Jafari; Barnes, Laura E. (2018-05-03). "RMDL: Random Multimodel Deep Learning for Classification". Proceedings of the 2nd International Conference on Information System and Data Mining. pp. 19–28. arXiv: 1805.01890 . doi:10.1145/3206098.3206111. ISBN   9781450363549. S2CID   19208611.
  28. Lindblad, Joakim; Nataša Sladoje (January 2014). "Linear time distances between fuzzy sets with applications to pattern matching and classification". IEEE Transactions on Image Processing. 23 (1): 126–136. Bibcode:2014ITIP...23..126L. doi:10.1109/TIP.2013.2286904. PMID   24158476. S2CID   1908950.
  29. Keysers, Daniel; Thomas Deselaers; Christian Gollan; Hermann Ney (August 2007). "Deformation models for image recognition". IEEE Transactions on Pattern Analysis and Machine Intelligence. 29 (8): 1422–1435. CiteSeerX   10.1.1.106.3963 . doi:10.1109/TPAMI.2007.1153. PMID   17568145. S2CID   2528485.
  30. Kégl, Balázs; Róbert Busa-Fekete (2009). "Boosting products of base classifiers" (PDF). Proceedings of the 26th Annual International Conference on Machine Learning. pp. 497–504. doi:10.1145/1553374.1553439. ISBN   9781605585161. S2CID   8460779 . Retrieved 27 August 2013.
  31. "RandomForestSRC: Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)". 21 January 2020.
  32. "Mehrad Mahmoudian / MNIST with RandomForest".
  33. Decoste, Dennis; Schölkopf, Bernhard (2002). "Training Invariant Support Vector Machines". Machine Learning. 46 (1–3): 161–190. doi: 10.1023/A:1012454411458 . ISSN   0885-6125. OCLC   703649027.
  34. 1 2 Patrice Y. Simard; Dave Steinkraus; John C. Platt (2003). "Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis". Proceedings of the Seventh International Conference on Document Analysis and Recognition. Vol. 1. Institute of Electrical and Electronics Engineers. p. 958. doi:10.1109/ICDAR.2003.1227801. ISBN   978-0-7695-1960-9. S2CID   4659176.
  35. Ciresan, Claudiu Dan; Ueli Meier; Luca Maria Gambardella; Juergen Schmidhuber (December 2010). "Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition". Neural Computation. 22 (12): 3207–20. arXiv: 1003.0358 . doi:10.1162/NECO_a_00052. PMID   20858131. S2CID   1918673.
  36. Romanuke, Vadim. "The single convolutional neural network best performance in 18 epochs on the expanded training data at Parallel Computing Center, Khmelnytskyi, Ukraine" . Retrieved 16 November 2016.
  37. Romanuke, Vadim. "Parallel Computing Center (Khmelnytskyi, Ukraine) gives a single convolutional neural network performing on MNIST at 0.27 percent error rate" . Retrieved 24 November 2016.
  38. Hu, Jie; Shen, Li; Albanie, Samuel; Sun, Gang; Wu, Enhua (2019). "Squeeze-and-Excitation Networks". IEEE Transactions on Pattern Analysis and Machine Intelligence. 42 (8): 2011–2023. arXiv: 1709.01507 . doi:10.1109/TPAMI.2019.2913372. PMID   31034408. S2CID   140309863.
  39. "GitHub - Matuzas77/MNIST-0.17: MNIST classifier with average 0.17% error". GitHub . 25 February 2020.
  40. An, Sanghyeon; Lee, Minjun; Park, Sanglee; Yang, Heerin; So, Jungmin (2020-10-04). "An Ensemble of Simple Convolutional Neural Network Models for MNIST Digit Recognition". arXiv: 2008.10400 [cs.CV].

Further reading