CIFAR-10

Last updated July 18, 2023

The CIFAR-10 dataset (Canadian Institute For Advanced Research) is a collection of images that are commonly used to train machine learning and computer vision algorithms. It is one of the most widely used datasets for machine learning research.^[1]^[2] The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes.^[3] The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.^[4]

Computer algorithms for recognizing objects in photos often learn by example. CIFAR-10 is a set of images that can be used to teach a computer how to recognize objects. Since the images in CIFAR-10 are low-resolution (32x32), this dataset can allow researchers to quickly try different algorithms to see what works.

CIFAR-10 is a labeled subset of the 80 Million Tiny Images dataset from 2008, published in 2009. When the dataset was created, students were paid to label all of the images.^[5]

Various kinds of convolutional neural networks tend to be the best at recognizing the images in CIFAR-10.

Research papers claiming state-of-the-art results on CIFAR-10

This is a table of some of the research papers that claim to have achieved state-of-the-art results on the CIFAR-10 dataset. Not all papers are standardized on the same pre-processing techniques, like image flipping or image shifting. For that reason, it is possible that one paper's claim of state-of-the-art could have a higher error rate than an older state-of-the-art claim but still be valid.

Paper title	Error rate (%)	Publication date
Convolutional Deep Belief Networks on CIFAR-10^[6]	21.1	August, 2010
Maxout Networks^[7]	9.38	February 13, 2013
Wide Residual Networks^[8]	4.0	May 23, 2016
Neural Architecture Search with Reinforcement Learning^[9]	3.65	November 4, 2016
Fractional Max-Pooling^[10]	3.47	December 18, 2014
Densely Connected Convolutional Networks^[11]	3.46	August 24, 2016
Shake-Shake regularization^[12]	2.86	May 21, 2017
Coupled Ensembles of Neural Networks^[13]	2.68	September 18, 2017
ShakeDrop regularization^[14]	2.67	Feb 7, 2018
Improved Regularization of Convolutional Neural Networks with Cutout^[15]	2.56	Aug 15, 2017
Regularized Evolution for Image Classifier Architecture Search^[16]	2.13	Feb 6, 2018
Rethinking Recurrent Neural Networks and other Improvements for Image Classification^[17]	1.64	July 31, 2020
AutoAugment: Learning Augmentation Policies from Data^[18]	1.48	May 24, 2018
A Survey on Neural Architecture Search^[19]	1.33	May 4, 2019
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism^[20]	1.00	Nov 16, 2018
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale^[21]	0.5	2021
Reduction of Class Activation Uncertainty with Background Information^[22]	0.3	May 5, 2023

Benchmarks

CIFAR-10 is also used as a performance benchmark for teams competing to run neural networks faster and cheaper. DAWNBench has benchmark data on their website.

Related Research Articles

<span class="mw-page-title-main">Activation function</span> Artificial neural network node function

In artificial neural networks, the activation function of a node defines the output of that node given an input or set of inputs. A standard integrated circuit can be seen as a digital network of activation functions that can be "ON" (1) or "OFF" (0), depending on input. This is similar to the linear perceptron in neural networks. However, only nonlinear activation functions allow such networks to compute nontrivial problems using only a small number of nodes, and such activation functions are called nonlinearities.

<span class="mw-page-title-main">Object detection</span> Computer technology related to computer vision and image processing

Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class in digital images and videos. Well-researched domains of object detection include face detection and pedestrian detection. Object detection has applications in many areas of computer vision, including image retrieval and video surveillance.

Deep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

<span class="mw-page-title-main">WikiArt</span> User-generated website displaying artworks

WikiArt is an online, user-editable visual art encyclopedia, active since 2010.

<span class="mw-page-title-main">MNIST database</span> Database of handwritten digits

The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.

In deep learning, a convolutional neural network (CNN) is a class of artificial neural network most commonly applied to analyze visual imagery. CNNs use a mathematical operation called convolution in place of general matrix multiplication in at least one of their layers. They are specifically designed to process pixel data and are used in image recognition and processing. They have applications in:

Adversarial machine learning is the study of the attacks on machine learning algorithms, and of the defenses against such attacks. A survey from May 2020 exposes the fact that practitioners report a dire need for better protecting machine learning systems in industrial applications.

DeepDream is a computer vision program created by Google engineer Alexander Mordvintsev that uses a convolutional neural network to find and enhance patterns in images via algorithmic pareidolia, thus creating a dream-like appearance reminiscent of a psychedelic experience in the deliberately overprocessed images.

The ImageNet project is a large visual database designed for use in visual object recognition software research. More than 14 million images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided. ImageNet contains more than 20,000 categories, with a typical category, such as "balloon" or "strawberry", consisting of several hundred images. The database of annotations of third-party image URLs is freely available directly from ImageNet, though the actual images are not owned by ImageNet. Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes.

<span class="mw-page-title-main">Residual neural network</span> Deep learning method

A Residual Neural Network is a deep learning model in which the weight layers learn residual functions with reference to the layer inputs. A Residual Network is a network with skip connections that perform identity mappings, merged with the layer outputs by addition. It behaves like a Highway Network whose gates are opened through strongly positive bias weights. This enables deep learning models with tens or hundreds of layers to train easily and approach better accuracy when going deeper. The identity skip connections, often referred to as "residual connections", are also used in the 1997 LSTM networks, Transformer models, the AlphaGo Zero system, the AlphaStar system, and the AlphaFold system.

Neural architecture search (NAS) is a technique for automating the design of artificial neural networks (ANN), a widely used model in the field of machine learning. NAS has been used to design networks that are on par or outperform hand-designed architectures. Methods for NAS can be categorized according to the search space, search strategy and performance estimation strategy used:

SqueezeNet is the name of a deep neural network for computer vision that was released in 2016. SqueezeNet was developed by researchers at DeepScale, University of California, Berkeley, and Stanford University. In designing SqueezeNet, the authors' goal was to create a smaller neural network with fewer parameters that can more easily fit into computer memory and can more easily be transmitted over a computer network.

Neural style transfer (NST) refers to a class of software algorithms that manipulate digital images, or videos, in order to adopt the appearance or visual style of another image. NST algorithms are characterized by their use of deep neural networks for the sake of image transformation. Common uses for NST are the creation of artificial artwork from photographs, for example by transferring the appearance of famous paintings to user-supplied photographs. Several notable mobile apps use NST techniques for this purpose, including DeepArt and Prisma. This method has been used by artists and designers around the globe to develop new artwork based on existent style(s).

<span class="mw-page-title-main">Swish function</span> Mathematical activation function in data analysis

The swish function is a mathematical function defined as follows:

In the context of artificial neural network, pruning is the practice of removing parameters from an existing network. The goal of this process is to maintain accuracy of the network while increasing its efficiency. This can be done to reduce the computational resources required to run the neural network.

A graph neural network (GNN) is a class of artificial neural networks for processing data that can be represented as graphs.

A Vision Transformer (ViT) is a transformer that is targeted at vision processing tasks such as image recognition.

The Fashion MNIST dataset is a large freely available database of fashion images that is commonly used for training and testing various machine learning systems. Fashion-MNIST was intended to serve as a replacement for the original MNIST database for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits.

Generative artificial intelligence or generative AI is a type of artificial intelligence (AI) system capable of generating text, images, or other media in response to prompts. Generative AI models learn the patterns and structure of their input training data by applying neural network machine learning techniques, and then generate new data that has similar characteristics.

References

↑ "AI Progress Measurement". Electronic Frontier Foundation. 2017-06-12. Retrieved 2017-12-11.
↑ "Popular Datasets Over Time | Kaggle". www.kaggle.com. Retrieved 2017-12-11.
↑ Hope, Tom; Resheff, Yehezkel S.; Lieder, Itay (2017-08-09). Learning TensorFlow: A Guide to Building Deep Learning Systems. O'Reilly Media, Inc. pp. 64–. ISBN 9781491978481 . Retrieved 22 January 2018.
↑ Angelov, Plamen; Gegov, Alexander; Jayne, Chrisina; Shen, Qiang (2016-09-06). Advances in Computational Intelligence Systems: Contributions Presented at the 16th UK Workshop on Computational Intelligence, September 7–9, 2016, Lancaster, UK. Springer International Publishing. pp. 441–. ISBN 9783319465623 . Retrieved 22 January 2018.
↑ Krizhevsky, Alex (2009). "Learning Multiple Layers of Features from Tiny Images" (PDF).
↑ "Convolutional Deep Belief Networks on CIFAR-10" (PDF).
↑ Goodfellow, Ian J.; Warde-Farley, David; Mirza, Mehdi; Courville, Aaron; Bengio, Yoshua (2013-02-13). "Maxout Networks". arXiv: 1302.4389 [stat.ML].
↑ Zagoruyko, Sergey; Komodakis, Nikos (2016-05-23). "Wide Residual Networks". arXiv: 1605.07146 [cs.CV].
↑ Zoph, Barret; Le, Quoc V. (2016-11-04). "Neural Architecture Search with Reinforcement Learning". arXiv: 1611.01578 [cs.LG].
↑ Graham, Benjamin (2014-12-18). "Fractional Max-Pooling". arXiv: 1412.6071 [cs.CV].
↑ Huang, Gao; Liu, Zhuang; Weinberger, Kilian Q.; van der Maaten, Laurens (2016-08-24). "Densely Connected Convolutional Networks". arXiv: 1608.06993 [cs.CV].
↑ Gastaldi, Xavier (2017-05-21). "Shake-Shake regularization". arXiv: 1705.07485 [cs.LG].
↑ Dutt, Anuvabh (2017-09-18). "Coupled Ensembles of Neural Networks". arXiv: 1709.06053 [cs.CV].
↑ Yamada, Yoshihiro; Iwamura, Masakazu; Kise, Koichi (2018-02-07). "Shakedrop Regularization for Deep Residual Learning". IEEE Access. 7: 186126–186136. arXiv: 1802.02375 . doi:10.1109/ACCESS.2019.2960566. S2CID 54445621.
↑ Terrance, DeVries; W., Taylor, Graham (2017-08-15). "Improved Regularization of Convolutional Neural Networks with Cutout". arXiv: 1708.04552 [cs.CV].
↑ Real, Esteban; Aggarwal, Alok; Huang, Yanping; Le, Quoc V. (2018-02-05). "Regularized Evolution for Image Classifier Architecture Search with Cutout". arXiv: 1802.01548 [cs.NE].
↑ Nguyen, Huu P.; Ribeiro, Bernardete (2020-07-31). "Rethinking Recurrent Neural Networks and other Improvements for Image Classification". arXiv: 2007.15161 [cs.CV].
↑ Cubuk, Ekin D.; Zoph, Barret; Mane, Dandelion; Vasudevan, Vijay; Le, Quoc V. (2018-05-24). "AutoAugment: Learning Augmentation Policies from Data". arXiv: 1805.09501 [cs.CV].
↑ Wistuba, Martin; Rawat, Ambrish; Pedapati, Tejaswini (2019-05-04). "A Survey on Neural Architecture Search". arXiv: 1905.01392 [cs.LG].
↑ Huang, Yanping; Cheng, Yonglong; Chen, Dehao; Lee, HyoukJoong; Ngiam, Jiquan; Le, Quoc V.; Zhifeng, Zhifeng (2018-11-16). "GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism". arXiv: 1811.06965 [cs.CV].
↑ Dosovitskiy, Alexey; Beyer, Lucas; Kolesnikov, Alexander; Weissenborn, Dirk; Zhai, Xiaohua; Unterthiner, Thomas; Dehghani, Mostafa; Minderer, Matthias; Heigold, Georg; Gelly, Sylvain; Uszkoreit, Jakob; Houlsby, Neil (2021). "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale". International Conference on Learning Representations . arXiv: 2010.11929 .
↑ Kabir, Hussain (2023-05-05). "Reduction of Class Activation Uncertainty with Background Information". arXiv: 2305.03238 [cs.CV].

External links

CIFAR-10 page - The home of the dataset
Canadian Institute For Advanced Research

Similar datasets

CIFAR-100: Similar to CIFAR-10 but with 100 classes and 600 images each.
ImageNet (ILSVRC): 1 million color images of 1000 classes. Imagenet images are higher resolution, averaging 469x387 resolution.
Street View House Numbers (SVHN): Approximately 600,000 images of 10 classes (digits 0-9). Also 32x32 color images.
80 million tiny images dataset: CIFAR-10 is a labeled subset of this dataset.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "AI Progress Measurement". Electronic Frontier Foundation. 2017-06-12. Retrieved 2017-12-11.

[2] "Popular Datasets Over Time | Kaggle". www.kaggle.com. Retrieved 2017-12-11.

[HopeResheff2017-3] Hope, Tom; Resheff, Yehezkel S.; Lieder, Itay (2017-08-09). Learning TensorFlow: A Guide to Building Deep Learning Systems. O'Reilly Media, Inc. pp. 64–. ISBN 9781491978481 . Retrieved 22 January 2018.

[AngelovGegov2016-4] Angelov, Plamen; Gegov, Alexander; Jayne, Chrisina; Shen, Qiang (2016-09-06). Advances in Computational Intelligence Systems: Contributions Presented at the 16th UK Workshop on Computational Intelligence, September 7–9, 2016, Lancaster, UK. Springer International Publishing. pp. 441–. ISBN 9783319465623 . Retrieved 22 January 2018.

[5] Krizhevsky, Alex (2009). "Learning Multiple Layers of Features from Tiny Images" (PDF).

[6] "Convolutional Deep Belief Networks on CIFAR-10" (PDF).

[7] Goodfellow, Ian J.; Warde-Farley, David; Mirza, Mehdi; Courville, Aaron; Bengio, Yoshua (2013-02-13). "Maxout Networks". arXiv: 1302.4389 [stat.ML].

[8] Zagoruyko, Sergey; Komodakis, Nikos (2016-05-23). "Wide Residual Networks". arXiv: 1605.07146 [cs.CV].

[9] Zoph, Barret; Le, Quoc V. (2016-11-04). "Neural Architecture Search with Reinforcement Learning". arXiv: 1611.01578 [cs.LG].

[10] Graham, Benjamin (2014-12-18). "Fractional Max-Pooling". arXiv: 1412.6071 [cs.CV].

[11] Huang, Gao; Liu, Zhuang; Weinberger, Kilian Q.; van der Maaten, Laurens (2016-08-24). "Densely Connected Convolutional Networks". arXiv: 1608.06993 [cs.CV].

[12] Gastaldi, Xavier (2017-05-21). "Shake-Shake regularization". arXiv: 1705.07485 [cs.LG].

[13] Dutt, Anuvabh (2017-09-18). "Coupled Ensembles of Neural Networks". arXiv: 1709.06053 [cs.CV].

[14] Yamada, Yoshihiro; Iwamura, Masakazu; Kise, Koichi (2018-02-07). "Shakedrop Regularization for Deep Residual Learning". IEEE Access. 7: 186126–186136. arXiv: 1802.02375 . doi:10.1109/ACCESS.2019.2960566. S2CID 54445621.

[15] Terrance, DeVries; W., Taylor, Graham (2017-08-15). "Improved Regularization of Convolutional Neural Networks with Cutout". arXiv: 1708.04552 [cs.CV].

[16] Real, Esteban; Aggarwal, Alok; Huang, Yanping; Le, Quoc V. (2018-02-05). "Regularized Evolution for Image Classifier Architecture Search with Cutout". arXiv: 1802.01548 [cs.NE].

[17] Nguyen, Huu P.; Ribeiro, Bernardete (2020-07-31). "Rethinking Recurrent Neural Networks and other Improvements for Image Classification". arXiv: 2007.15161 [cs.CV].

[18] Cubuk, Ekin D.; Zoph, Barret; Mane, Dandelion; Vasudevan, Vijay; Le, Quoc V. (2018-05-24). "AutoAugment: Learning Augmentation Policies from Data". arXiv: 1805.09501 [cs.CV].

[19] Wistuba, Martin; Rawat, Ambrish; Pedapati, Tejaswini (2019-05-04). "A Survey on Neural Architecture Search". arXiv: 1905.01392 [cs.LG].

[20] Huang, Yanping; Cheng, Yonglong; Chen, Dehao; Lee, HyoukJoong; Ngiam, Jiquan; Le, Quoc V.; Zhifeng, Zhifeng (2018-11-16). "GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism". arXiv: 1811.06965 [cs.CV].

[21] Dosovitskiy, Alexey; Beyer, Lucas; Kolesnikov, Alexander; Weissenborn, Dirk; Zhai, Xiaohua; Unterthiner, Thomas; Dehghani, Mostafa; Minderer, Matthias; Heigold, Georg; Gelly, Sylvain; Uszkoreit, Jakob; Houlsby, Neil (2021). "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale". International Conference on Learning Representations . arXiv: 2010.11929 .

[22] Kabir, Hussain (2023-05-05). "Reduction of Class Activation Uncertainty with Background Information". arXiv: 2305.03238 [cs.CV].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]