The MNIST database (Modified National Institute of Standards and Technology database [1] ) is a large database of handwritten digits that is commonly used for training various image processing systems. [2] [3] The database is also widely used for training and testing in the field of machine learning. [4] [5] It was created by "re-mixing" the samples from NIST's original datasets. [6] The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. [7] Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels. [7]
The MNIST database contains 60,000 training images and 10,000 testing images. [8] Half of the training set and half of the test set were taken from NIST's training dataset, while the other half of the training set and the other half of the test set were taken from NIST's testing dataset. [9] The original creators of the database keep a list of some of the methods tested on it. [7] In their original paper, they use a support-vector machine to get an error rate of 0.8%. [10]
The original MNIST dataset contains at least 4 wrong labels. [11]
The set of images in the MNIST database was created in 1994. Previously, NIST released two datasets: Special Database 1 (NIST Test Data I, or SD-1); and Special Database 3 (or SD-2). They were released on two CD-ROMs.
SD-1 was the test set, and it contained digits written by high school students, 58,646 images written by 500 different writers. Each image is accompanied by the identity of its writer. SD-3 was the training set, and it contained digits written by 2000 employees of the United States Census Bureau. It was much cleaner and easier to recognize than images in SD-1. [7] It was found that machine learning systems trained and validated on SD-3 suffered significant drops in performance on the test set. [12]
The original dataset from MNIST contained 128x128 binary images. Each was size-normalized to fit in a 20x20 pixel box while preserving their aspect ratio, and anti-aliased to grayscale. Then it was put into a 28x28 image by translating it until the center of mass of the pixels is in the center of the image. The details of how the downsampling proceeded was reconstructed. [13]
The training set and the testing set both originally had 60k samples, but 50k of the testing set samples were discarded. These were restored to construct the QMNIST, which has 60k images in the training set and 60k in the testing set. [14] [13]
Extended MNIST (EMNIST) is a newer dataset developed and released by NIST to be the (final) successor to MNIST. [15] [16] MNIST included images only of handwritten digits. EMNIST includes all the images from NIST Special Database 19 (SD 19), which is a large database of 814,255 handwritten uppercase and lower case letters and digits. [17] [18] The images in EMNIST were converted into the same 28x28 pixel format, by the same process, as were the MNIST images. Accordingly, tools which work with the older, smaller, MNIST dataset will likely work unmodified with EMNIST.
Fashion MNIST was created in 2017 as a more challenging replacement for MNIST. The dataset consists of 70,000 28x28 grayscale images of fashion products from 10 categories. [19]
Some researchers have achieved "near-human performance" on the MNIST database, using a committee of neural networks; in the same paper, the authors achieve performance double that of humans on other recognition tasks. [20] The highest error rate listed [7] on the original website of the database is 12 percent, which is achieved using a simple linear classifier with no preprocessing. [10]
In 2004, a best-case error rate of 0.42 percent was achieved on the database by researchers using a new classifier called the LIRA, which is a neural classifier with three neuron layers based on Rosenblatt's perceptron principles. [21]
Some researchers have tested artificial intelligence systems using the database put under random distortions. The systems in these cases are usually neural networks and the distortions used tend to be either affine distortions or elastic distortions. [7] Sometimes, these systems can be very successful; one such system achieved an error rate on the database of 0.39 percent. [22]
In 2011, an error rate of 0.27 percent, improving on the previous best result, was reported by researchers using a similar system of neural networks. [23] In 2013, an approach based on regularization of neural networks using DropConnect has been claimed to achieve a 0.21 percent error rate. [24] In 2016, the single convolutional neural network best performance was 0.25 percent error rate. [25] As of August 2018, the best performance of a single convolutional neural network trained on MNIST training data using no data augmentation is 0.25 percent error rate. [25] [26] Also, the Parallel Computing Center (Khmelnytskyi, Ukraine) obtained an ensemble of only 5 convolutional neural networks which performs on MNIST at 0.21 percent error rate. [27] [28]
This is a table of some of the machine learning methods used on the dataset and their error rates, by type of classifier:
Type | Classifier | Distortion | Preprocessing | Error rate (%) |
---|---|---|---|---|
Linear classifier | Pairwise linear classifier | None | Deskewing | 7.6 [10] |
K-Nearest Neighbors | K-NN with rigid transformations | None | None | 0.96 [29] |
K-Nearest Neighbors | K-NN with non-linear deformation (P2DHMDM) | None | Shiftable edges | 0.52 [30] |
Boosted Stumps | Product of stumps on Haar features | None | Haar features | 0.87 [31] |
Non-linear classifier | 40 PCA + quadratic classifier | None | None | 3.3 [10] |
Random Forest | Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC) [32] | None | Simple statistical pixel importance | 2.8 [33] |
Support-vector machine (SVM) | Virtual SVM, deg-9 poly, 2-pixel jittered | None | Deskewing | 0.56 [34] |
Neural network | 2-layer 784-800-10 | None | None | 1.6 [35] |
Neural network | 2-layer 784-800-10 | Elastic distortions | None | 0.7 [35] |
Deep neural network (DNN) | 6-layer 784-2500-2000-1500-1000-500-10 | Elastic distortions | None | 0.35 [36] |
Convolutional neural network (CNN) | 6-layer 784-40-80-500-1000-2000-10 | None | Expansion of the training data | 0.31 [37] |
Convolutional neural network | 6-layer 784-50-100-500-1000-10-10 | None | Expansion of the training data | 0.27 [38] |
Convolutional neural network (CNN) | 13-layer 64-128(5x)-256(3x)-512-2048-256-256-10 | None | None | 0.25 [25] |
Convolutional neural network | Committee of 35 CNNs, 1-20-P-40-P-150-10 | Elastic distortions | Width normalizations | 0.23 [20] |
Convolutional neural network | Committee of 5 CNNs, 6-layer 784-50-100-500-1000-10-10 | None | Expansion of the training data | 0.21 [27] [28] |
Convolutional neural network | Committee of 20 CNNS with Squeeze-and-Excitation Networks [39] | None | Data augmentation | 0.17 [40] |
Convolutional neural network | Ensemble of 3 CNNs with varying kernel sizes | None | Data augmentation consisting of rotation and translation | 0.09 [41] |
In machine learning, a neural network is a model inspired by the structure and function of biological neural networks in animal brains.
Handwriting recognition (HWR), also known as handwritten text recognition (HTR), is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. The image of the written text may be sensed "off line" from a piece of paper by optical scanning or intelligent word recognition. Alternatively, the movements of the pen tip may be sensed "on line", for example by a pen-based computer screen surface, a generally easier task as there are more clues available. A handwriting recognition system handles formatting, performs correct segmentation into characters, and finds the most possible words.
Jürgen Schmidhuber is a German computer scientist noted for his work in the field of artificial intelligence, specifically artificial neural networks. He is a scientific director of the Dalle Molle Institute for Artificial Intelligence Research in Switzerland. He is also director of the Artificial Intelligence Initiative and professor of the Computer Science program in the Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE) division at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia.
An optical neural network is a physical implementation of an artificial neural network with optical components. Early optical neural networks used a photorefractive Volume hologram to interconnect arrays of input neurons to arrays of output with synaptic weights in proportion to the multiplexed hologram's strength. Volume holograms were further multiplexed using spectral hole burning to add one dimension of wavelength to space to achieve four dimensional interconnects of two dimensional arrays of neural inputs and outputs. This research led to extensive research on alternative methods using the strength of the optical interconnect for implementing neuronal communications.
Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class in digital images and videos. Well-researched domains of object detection include face detection and pedestrian detection. Object detection has applications in many areas of computer vision, including image retrieval and video surveillance.
Yann André LeCun is a French-American computer scientist working primarily in the fields of machine learning, computer vision, mobile robotics and computational neuroscience. He is the Silver Professor of the Courant Institute of Mathematical Sciences at New York University and Vice President, Chief AI Scientist at Meta.
Deep learning is a subset of machine learning that focuses on utilizing neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience and is centered around stacking artificial neurons into layers and "training" them to process data. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.
A convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns features by itself via filter optimization. This type of deep learning network has been applied to process and make predictions from many different types of data including text, images and audio. Convolution-based networks are the de-facto standard in deep learning-based approaches to computer vision and image processing, and have only recently have been replaced -- in some cases -- by newer deep learning architectures such as the transformer. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.
This page is a timeline of machine learning. Major discoveries, achievements, milestones and other major events in machine learning are included.
The ImageNet project is a large visual database designed for use in visual object recognition software research. More than 14 million images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided. ImageNet contains more than 20,000 categories, with a typical category, such as "balloon" or "strawberry", consisting of several hundred images. The database of annotations of third-party image URLs is freely available directly from ImageNet, though the actual images are not owned by ImageNet. Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge, where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes.
Data augmentation is a statistical technique which allows maximum likelihood estimation from incomplete data. Data augmentation has important applications in Bayesian analysis, and the technique is widely used in machine learning to reduce overfitting when training machine learning models, achieved by training models on several slightly-modified copies of existing data.
AlexNet is a convolutional neural network (CNN) architecture, designed by Alex Krizhevsky in collaboration with Ilya Sutskever and Geoffrey Hinton, who was Krizhevsky's Ph.D. advisor at the University of Toronto in 2012. It had 60 million parameters and 650,000 neurons.
The CIFAR-10 dataset is a collection of images that are commonly used to train machine learning and computer vision algorithms. It is one of the most widely used datasets for machine learning research. The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.
Neural architecture search (NAS) is a technique for automating the design of artificial neural networks (ANN), a widely used model in the field of machine learning. NAS has been used to design networks that are on par with or outperform hand-designed architectures. Methods for NAS can be categorized according to the search space, search strategy and performance estimation strategy used:
In artificial neural networks, a convolutional layer is a type of network layer that applies a convolution operation to the input. Convolutional layers are some of the primary building blocks of convolutional neural networks (CNNs), a class of neural network most commonly applied to images, video, audio, and other data that have the property of uniform translational symmetry.
Artificial neural networks (ANNs) are models created using machine learning to perform a number of tasks. Their creation was inspired by biological neural circuitry. While some of the computational implementations ANNs relate to earlier discoveries in mathematics, the first implementation of ANNs was by psychologist Frank Rosenblatt, who developed the perceptron. Little research was conducted on ANNs in the 1970s and 1980s, with the AAAI calling this period an "AI winter".
LeNet is a series of convolutional neural network structure proposed by LeCun et al.. The earliest version, LeNet-1, was trained in 1989. In general, when "LeNet" is referred to without a number, it refers to LeNet-5 (1998), the most well-known version.
The Fashion MNIST dataset is a large freely available database of fashion images that is commonly used for training and testing various machine learning systems. Fashion-MNIST was intended to serve as a replacement for the original MNIST database for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits.
Clément Farabet is a computer scientist and AI expert known for his contributions to the field of deep learning. He served as a research scientist at the New York University. He serves as the Vice President of Research at Google DeepMind and previously served as the VP of AI Infrastructure at NVIDIA.
{{cite book}}
: CS1 maint: location missing publisher (link)Article has a detailed history and a reconstruction of the discarded testing set.