Fashion MNIST

Last updated

The Fashion MNIST dataset is a large freely available database of fashion images that is commonly used for training and testing various machine learning systems. [1] [2] Fashion-MNIST was intended to serve as a replacement for the original MNIST database for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits. [3]

Contents

The dataset contains 60,000 28x28 grayscale images of fashion products from 10 categories from a dataset of Zalando article images, with 6,000 images per category. [1] The training set consists of 60,000 images and the test set consists of 10,000 images. The dataset is commonly included in standard machine learning libraries. [4]

History

The set of images in the Fashion MNIST database was created in 2017 to pose a more challenging classification task than the simple MNIST digits data, which saw performance reaching upwards of 99.7%. [1]

The GitHub repository has collected over 4000 stars and is referred to more than 400 repositories, 1000 commits and 7000 code snippets. [5]

Numerous machine learning algorithms [6] have used the dataset as a benchmark, [7] [8] [9] [10] with the top algorithm [11] achieving 96.91% accuracy in 2020 according to the benchmark rankings website. [12] The dataset was also used as a benchmark in the 2018 Science paper using all optical hardware to classify images at the speed of light. [13] Google, University of Cambridge, IBM Research, Université de Montréal, and Peking University are the repositories most published institutions as of 2021.[ citation needed ]

See also

Related Research Articles

<span class="mw-page-title-main">Neural network (machine learning)</span> Computational model used in machine learning, based on connected, hierarchical functions

In machine learning, a neural network is a model inspired by the structure and function of biological neural networks in animal brains.

In predictive analytics, data science, machine learning and related fields, concept drift or drift is an evolution of data that invalidates the data model. It happens when the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes. Drift detection and drift adaptation are of paramount importance in the fields that involve dynamically changing data and data models.

In computer science, landmark detection is the process of finding significant landmarks in an image. This originally referred to finding landmarks for navigational purposes – for instance, in robot vision or creating maps from satellite images. Methods used in navigation have been extended to other fields, notably in facial recognition where it is used to identify key points on a face. It also has important applications in medicine, identifying anatomical landmarks in medical images.

<span class="mw-page-title-main">Deep learning</span> Branch of machine learning

Deep learning is a subset of machine learning that focuses on utilizing neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience and is centered around stacking artificial neurons into layers and "training" them to process data. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

<span class="mw-page-title-main">WikiArt</span> User-generated website displaying artworks

WikiArt is a visual art wiki, active since 2010.

<span class="mw-page-title-main">MNIST database</span> Database of handwritten digits

The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.

A convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns features by itself via filter optimization. This type of deep learning network has been applied to process and make predictions from many different types of data including text, images and audio. Convolution-based networks are the de-facto standard in deep learning-based approaches to computer vision and image processing, and have only recently been replaced -- in some cases -- by newer deep learning architectures such as the transformer. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.

<span class="mw-page-title-main">Quantum machine learning</span> Interdisciplinary research area at the intersection of quantum physics and machine learning

Quantum machine learning is the integration of quantum algorithms within machine learning programs.

Adversarial machine learning is the study of the attacks on machine learning algorithms, and of the defenses against such attacks. A survey from May 2020 exposes the fact that practitioners report a dire need for better protecting machine learning systems in industrial applications.

The ImageNet project is a large visual database designed for use in visual object recognition software research. More than 14 million images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided. ImageNet contains more than 20,000 categories, with a typical category, such as "balloon" or "strawberry", consisting of several hundred images. The database of annotations of third-party image URLs is freely available directly from ImageNet, though the actual images are not owned by ImageNet. Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge, where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes.

Data augmentation is a statistical technique which allows maximum likelihood estimation from incomplete data. Data augmentation has important applications in Bayesian analysis, and the technique is widely used in machine learning to reduce overfitting when training machine learning models, achieved by training models on several slightly-modified copies of existing data.

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

The CIFAR-10 dataset is a collection of images that are commonly used to train machine learning and computer vision algorithms. It is one of the most widely used datasets for machine learning research. The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.

Neural architecture search (NAS) is a technique for automating the design of artificial neural networks (ANN), a widely used model in the field of machine learning. NAS has been used to design networks that are on par with or outperform hand-designed architectures. Methods for NAS can be categorized according to the search space, search strategy and performance estimation strategy used:

<span class="mw-page-title-main">LeNet</span> Convolutional neural network structure

LeNet is a series of convolutional neural network structure proposed by LeCun et al.. The earliest version, LeNet-1, was trained in 1989. In general, when "LeNet" is referred to without a number, it refers to LeNet-5 (1998), the most well-known version.

An energy-based model (EBM) is an application of canonical ensemble formulation from statistical physics for learning from data. The approach prominently appears in generative artificial intelligence.

<span class="mw-page-title-main">Video super-resolution</span> Generating high-resolution video frames from given low-resolution ones

Video super-resolution (VSR) is the process of generating high-resolution video frames from the given low-resolution video frames. Unlike single-image super-resolution (SISR), the main goal is not only to restore more fine details while saving coarse ones, but also to preserve motion consistency.

<span class="mw-page-title-main">Knowledge graph embedding</span> Dimensionality reduction of graph-based semantic data objects [machine learning task]

In representation learning, knowledge graph embedding (KGE), also referred to as knowledge representation learning (KRL), or multi-relation learning, is a machine learning task of learning a low-dimensional representation of a knowledge graph's entities and relations while preserving their semantic meaning. Leveraging their embedded representation, knowledge graphs (KGs) can be used for various applications such as link prediction, triple classification, entity recognition, clustering, and relation extraction.

References

  1. 1 2 3 Xiao, Han; Rasul, Kashif; Vollgraf, Roland (2017-09-15). "Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms". arXiv: 1708.07747 [cs.LG].
  2. Shenwai, Tanushree (2021-09-07). "A New Google AI Research Study Discovers Anomalous Data Using Self Supervised Learning". MarkTechPost. Retrieved 2021-10-07.
  3. "Fashion-MNIST: Year In Review · Han Xiao Tech Blog - Neural Search & AI Engineering". hanxiao.io. Retrieved 2022-01-30.
  4. "Basic classification: Classify images of clothing | TensorFlow Core". TensorFlow. Retrieved 2021-10-07.
  5. "Build software better, together". GitHub. Retrieved 2022-01-30.
  6. "Papers using Fashion-MNIST (till 09.18)". Google Docs. Retrieved 2022-01-30.
  7. Meshkini, Khatereh; Platos, Jan; Ghassemain, Hassan (2020). "An Analysis of Convolutional Neural Network for Fashion Images Classification (Fashion-MNIST)". In Kovalev, Sergey; Tarassov, Valery; Snasel, Vaclav; Sukhanov, Andrey (eds.). Proceedings of the Fourth International Scientific Conference "Intelligent Information Technologies for Industry" (IITI'19). Advances in Intelligent Systems and Computing. Vol. 1156. Cham: Springer International Publishing. pp. 85–95. doi:10.1007/978-3-030-50097-9_10. ISBN   978-3-030-50097-9. S2CID   226778948.
  8. Kayed, Mohammed; Anter, Ahmed; Mohamed, Hadeer (February 2020). "Classification of Garments from Fashion MNIST Dataset Using CNN LeNet-5 Architecture". 2020 International Conference on Innovative Trends in Communication and Computer Engineering (ITCE). pp. 238–243. doi:10.1109/ITCE48509.2020.9047776. ISBN   978-1-7281-4801-4. S2CID   214691687.
  9. Bhatnagar, Shobhit; Ghosal, Deepanway; Kolekar, Maheshkumar H. (December 2017). "Classification of fashion article images using convolutional neural networks". 2017 Fourth International Conference on Image Information Processing (ICIIP). pp. 1–6. doi:10.1109/ICIIP.2017.8313740. ISBN   978-1-5090-6733-6. S2CID   3888338.
  10. Kadam, Shivam S.; Adamuthe, Amol C.; Patil, Ashwini B. (2020). "CNN Model for Image Classification on MNIST and Fashion-MNIST Dataset" (PDF). Journal of Scientific Research. 64 (2): 374–384. doi:10.37398/JSR.2020.640251. S2CID   226435631.
  11. Tanveer, Muhammad Suhaib; Khan, Muhammad Umar Karim; Kyung, Chong-Min (2020-06-16). "Fine-Tuning DARTS for Image Classification". arXiv: 2006.09042 [cs.CV].
  12. "Papers with Code - Fashion-MNIST Benchmark (Image Classification)". paperswithcode.com. Retrieved 2022-01-30.
  13. Lin, Xing; Rivenson, Yair; Yardimci, Nezih T.; Veli, Muhammed; Luo, Yi; Jarrahi, Mona; Ozcan, Aydogan (2018-09-07). "All-optical machine learning using diffractive deep neural networks". Science. 361 (6406): 1004–1008. arXiv: 1804.08711 . Bibcode:2018Sci...361.1004L. doi:10.1126/science.aat8084. ISSN   0036-8075. PMID   30049787. S2CID   13753997.