The CIFAR-10 dataset (Canadian Institute For Advanced Research) is a collection of images that are commonly used to train machine learning and computer vision algorithms. It is one of the most widely used datasets for machine learning research. [1] [2] The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes. [3] The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class. [4]
Computer algorithms for recognizing objects in photos often learn by example. CIFAR-10 is a set of images that can be used to teach a computer how to recognize objects. Since the images in CIFAR-10 are low-resolution (32x32), this dataset can allow researchers to quickly try different algorithms to see what works.
CIFAR-10 is a labeled subset of the 80 Million Tiny Images dataset from 2008, published in 2009. When the dataset was created, students were paid to label all of the images. [5]
Various kinds of convolutional neural networks tend to be the best at recognizing the images in CIFAR-10.
This is a table of some of the research papers that claim to have achieved state-of-the-art results on the CIFAR-10 dataset. Not all papers are standardized on the same pre-processing techniques, like image flipping or image shifting. For that reason, it is possible that one paper's claim of state-of-the-art could have a higher error rate than an older state-of-the-art claim but still be valid.
Paper title | Error rate (%) | Publication date |
---|---|---|
Convolutional Deep Belief Networks on CIFAR-10 [6] | 21.1 | August, 2010 |
Maxout Networks [7] | 9.38 | February 13, 2013 |
Wide Residual Networks [8] | 4.0 | May 23, 2016 |
Neural Architecture Search with Reinforcement Learning [9] | 3.65 | November 4, 2016 |
Fractional Max-Pooling [10] | 3.47 | December 18, 2014 |
Densely Connected Convolutional Networks [11] | 3.46 | August 24, 2016 |
Shake-Shake regularization [12] | 2.86 | May 21, 2017 |
Coupled Ensembles of Neural Networks [13] | 2.68 | September 18, 2017 |
ShakeDrop regularization [14] | 2.67 | Feb 7, 2018 |
Improved Regularization of Convolutional Neural Networks with Cutout [15] | 2.56 | Aug 15, 2017 |
Regularized Evolution for Image Classifier Architecture Search [16] | 2.13 | Feb 6, 2018 |
Rethinking Recurrent Neural Networks and other Improvements for Image Classification [17] | 1.64 | July 31, 2020 |
AutoAugment: Learning Augmentation Policies from Data [18] | 1.48 | May 24, 2018 |
A Survey on Neural Architecture Search [19] | 1.33 | May 4, 2019 |
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism [20] | 1.00 | Nov 16, 2018 |
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [21] | 0.5 | 2021 |
Reduction of Class Activation Uncertainty with Background Information [22] | 0.3 | May 5, 2023 |
CIFAR-10 is also used as a performance benchmark for teams competing to run neural networks faster and cheaper. DAWNBench has benchmark data on their website.
In artificial neural networks, the activation function of a node defines the output of that node given an input or set of inputs. A standard integrated circuit can be seen as a digital network of activation functions that can be "ON" (1) or "OFF" (0), depending on input. This is similar to the linear perceptron in neural networks. However, only nonlinear activation functions allow such networks to compute nontrivial problems using only a small number of nodes, and such activation functions are called nonlinearities.
Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class in digital images and videos. Well-researched domains of object detection include face detection and pedestrian detection. Object detection has applications in many areas of computer vision, including image retrieval and video surveillance.
Deep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.
WikiArt is an online, user-editable visual art encyclopedia, active since 2010.
The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.
In deep learning, a convolutional neural network (CNN) is a class of artificial neural network most commonly applied to analyze visual imagery. CNNs use a mathematical operation called convolution in place of general matrix multiplication in at least one of their layers. They are specifically designed to process pixel data and are used in image recognition and processing. They have applications in:
Adversarial machine learning is the study of the attacks on machine learning algorithms, and of the defenses against such attacks. A survey from May 2020 exposes the fact that practitioners report a dire need for better protecting machine learning systems in industrial applications.
DeepDream is a computer vision program created by Google engineer Alexander Mordvintsev that uses a convolutional neural network to find and enhance patterns in images via algorithmic pareidolia, thus creating a dream-like appearance reminiscent of a psychedelic experience in the deliberately overprocessed images.
The ImageNet project is a large visual database designed for use in visual object recognition software research. More than 14 million images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided. ImageNet contains more than 20,000 categories, with a typical category, such as "balloon" or "strawberry", consisting of several hundred images. The database of annotations of third-party image URLs is freely available directly from ImageNet, though the actual images are not owned by ImageNet. Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes.
A Residual Neural Network is a deep learning model in which the weight layers learn residual functions with reference to the layer inputs. A Residual Network is a network with skip connections that perform identity mappings, merged with the layer outputs by addition. It behaves like a Highway Network whose gates are opened through strongly positive bias weights. This enables deep learning models with tens or hundreds of layers to train easily and approach better accuracy when going deeper. The identity skip connections, often referred to as "residual connections", are also used in the 1997 LSTM networks, Transformer models, the AlphaGo Zero system, the AlphaStar system, and the AlphaFold system.
Neural architecture search (NAS) is a technique for automating the design of artificial neural networks (ANN), a widely used model in the field of machine learning. NAS has been used to design networks that are on par or outperform hand-designed architectures. Methods for NAS can be categorized according to the search space, search strategy and performance estimation strategy used:
SqueezeNet is the name of a deep neural network for computer vision that was released in 2016. SqueezeNet was developed by researchers at DeepScale, University of California, Berkeley, and Stanford University. In designing SqueezeNet, the authors' goal was to create a smaller neural network with fewer parameters that can more easily fit into computer memory and can more easily be transmitted over a computer network.
Neural style transfer (NST) refers to a class of software algorithms that manipulate digital images, or videos, in order to adopt the appearance or visual style of another image. NST algorithms are characterized by their use of deep neural networks for the sake of image transformation. Common uses for NST are the creation of artificial artwork from photographs, for example by transferring the appearance of famous paintings to user-supplied photographs. Several notable mobile apps use NST techniques for this purpose, including DeepArt and Prisma. This method has been used by artists and designers around the globe to develop new artwork based on existent style(s).
The swish function is a mathematical function defined as follows:
In the context of artificial neural network, pruning is the practice of removing parameters from an existing network. The goal of this process is to maintain accuracy of the network while increasing its efficiency. This can be done to reduce the computational resources required to run the neural network.
A graph neural network (GNN) is a class of artificial neural networks for processing data that can be represented as graphs.
A Vision Transformer (ViT) is a transformer that is targeted at vision processing tasks such as image recognition.
The Fashion MNIST dataset is a large freely available database of fashion images that is commonly used for training and testing various machine learning systems. Fashion-MNIST was intended to serve as a replacement for the original MNIST database for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits.
Generative artificial intelligence or generative AI is a type of artificial intelligence (AI) system capable of generating text, images, or other media in response to prompts. Generative AI models learn the patterns and structure of their input training data by applying neural network machine learning techniques, and then generate new data that has similar characteristics.