Fashion MNIST

Last updated

The Fashion MNIST dataset is a large freely available database of fashion images that is commonly used for training and testing various machine learning systems. [1] [2] Fashion-MNIST was intended to serve as a replacement for the original MNIST database for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits. [3]

Contents

The dataset contains 70,000 28x28 grayscale images of fashion products from 10 categories from a dataset of Zalando article images, with 7,000 images per category. [1] The training set consists of 60,000 images and the test set consists of 10,000 images. The dataset is commonly included in standard machine learning libraries. [4]

History

The set of images in the Fashion MNIST database was created in 2017 to pose a more challenging classification task than the simple MNIST digits data, which saw performance reaching upwards of 99.7%. [1]

The GitHub repository has collected over 4000 stars and is referred to more than 400 repositories, 1000 commits and 7000 code snippets. [5]

Numerous machine learning algorithms [6] have used the dataset as a benchmark, [7] [8] [9] [10] with the top algorithm [11] achieving 96.91% accuracy in 2020 according to the benchmark rankings website. [12] The dataset was also used as a benchmark in the 2018 Science paper using all optical hardware to classify images at the speed of light. [13] Google, University of Cambridge, IBM Research, Université de Montréal, and Peking University are the repositories most published institutions as of 2021.[ citation needed ]

See also

Related Research Articles

In computer science, landmark detection is the process of finding significant landmarks in an image. This originally referred to finding landmarks for navigational purposes – for instance, in robot vision or creating maps from satellite images. Methods used in navigation have been extended to other fields, notably in facial recognition where it is used to identify key points on a face. It also has important applications in medicine, identifying anatomical landmarks in medical images.

<span class="mw-page-title-main">Deep learning</span> Branch of machine learning

Deep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

<span class="mw-page-title-main">MNIST database</span> Database of handwritten digits

The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.

<span class="mw-page-title-main">Convolutional neural network</span> Artificial neural network

In deep learning, a convolutional neural network (CNN) is a class of artificial neural network most commonly applied to analyze visual imagery. CNNs use a mathematical operation called convolution in place of general matrix multiplication in at least one of their layers. They are specifically designed to process pixel data and are used in image recognition and processing. They have applications in:

<span class="mw-page-title-main">Quantum machine learning</span> Interdisciplinary research area at the intersection of quantum physics and machine learning

Quantum machine learning is the integration of quantum algorithms within machine learning programs. The most common use of the term refers to machine learning algorithms for the analysis of classical data executed on a quantum computer, i.e. quantum-enhanced machine learning. While machine learning algorithms are used to compute immense quantities of data, quantum machine learning utilizes qubits and quantum operations or specialized quantum systems to improve computational speed and data storage done by algorithms in a program. This includes hybrid methods that involve both classical and quantum processing, where computationally difficult subroutines are outsourced to a quantum device. These routines can be more complex in nature and executed faster on a quantum computer. Furthermore, quantum algorithms can be used to analyze quantum states instead of classical data. Beyond quantum computing, the term "quantum machine learning" is also associated with classical machine learning methods applied to data generated from quantum experiments, such as learning the phase transitions of a quantum system or creating new quantum experiments. Quantum machine learning also extends to a branch of research that explores methodological and structural similarities between certain physical systems and learning systems, in particular neural networks. For example, some mathematical and numerical techniques from quantum physics are applicable to classical deep learning and vice versa. Furthermore, researchers investigate more abstract notions of learning theory with respect to quantum information, sometimes referred to as "quantum learning theory".

<span class="mw-page-title-main">Adversarial machine learning</span> Research field that lies at the intersection of machine learning and computer security

Adversarial machine learning is the study of the attacks on machine learning algorithms, and of the defenses against such attacks. A survey from May 2020 exposes the fact that practitioners report a dire need for better protecting machine learning systems in industrial applications.

<span class="mw-page-title-main">DeepDream</span> Software program

DeepDream is a computer vision program created by Google engineer Alexander Mordvintsev that uses a convolutional neural network to find and enhance patterns in images via algorithmic pareidolia, thus creating a dream-like appearance reminiscent of a psychedelic experience in the deliberately overprocessed images.

An AI accelerator is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications include algorithms for robotics, Internet of Things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2018, a typical AI integrated circuit chip contains billions of MOSFET transistors. A number of vendor-specific terms exist for devices in this category, and it is an emerging technology without a dominant design.

The ImageNet project is a large visual database designed for use in visual object recognition software research. More than 14 million images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided. ImageNet contains more than 20,000 categories, with a typical category, such as "balloon" or "strawberry", consisting of several hundred images. The database of annotations of third-party image URLs is freely available directly from ImageNet, though the actual images are not owned by ImageNet. Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes.

<span class="mw-page-title-main">Data augmentation</span> Data analysis technique

Data augmentation is a technique in machine learning used to reduce overfitting when training a machine learning model, by training models on several slightly-modified copies of existing data.

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

The CIFAR-10 dataset is a collection of images that are commonly used to train machine learning and computer vision algorithms. It is one of the most widely used datasets for machine learning research. The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.

<span class="mw-page-title-main">Neural architecture search</span> Machine learning-powered structure design

Neural architecture search (NAS) is a technique for automating the design of artificial neural networks (ANN), a widely used model in the field of machine learning. NAS has been used to design networks that are on par or outperform hand-designed architectures. Methods for NAS can be categorized according to the search space, search strategy and performance estimation strategy used:

Ronald Marc Summers is an American radiologist and senior investigator at the Diagnostic Radiology Department at the NIH Clinical Center in Bethesda, Maryland. He is chief of the Clinical Image Processing Service and directs the Imaging Biomarkers and Computer-Aided Diagnosis (CAD) Laboratory. A researcher in the field of radiology and computer-aided diagnosis, he has co-authored over 500 journal articles and conference proceedings papers and is a coinventor on 12 patents. His lab has conducted research applying artificial intelligence and deep learning to radiology.

<span class="mw-page-title-main">Spatial embedding</span>

Spatial embedding is one of feature learning techniques used in spatial analysis where points, lines, polygons or other spatial data types. representing geographic locations are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with many dimensions per geographic object to a continuous vector space with a much lower dimension.

<span class="mw-page-title-main">Self-supervised learning</span> Machine learning paradigm

Self-supervised learning (SSL) refers to a machine learning paradigm, and corresponding methods, for processing unlabeled data to obtain useful representations that can help with downstream learning tasks. The most salient thing about SSL methods is that they do not need human-annotated labels, which means they are designed to take in datasets consisting entirely of unlabeled data samples. Then the typical SSL pipeline consists of learning supervisory signals in a first stage, which are then used for some supervised learning task in the second and later stages. For this reason, SSL can be described as an intermediate form of unsupervised and supervised learning.

<span class="mw-page-title-main">Knowledge graph embedding</span> Dimensionality reduction of graph-based semantic data objects [machine learning task]

In representation learning, knowledge graph embedding (KGE), also referred to as knowledge representation learning (KRL), or multi-relation learning, is a machine learning task of learning a low-dimensional representation of a knowledge graph's entities and relations while preserving their semantic meaning. Leveraging their embedded representation, knowledge graphs (KGs) can be used for various applications such as link prediction, triple classification, entity recognition, clustering, and relation extraction.

A Vision Transformer (ViT) is a transformer that is targeted at vision processing tasks such as image recognition.

Applications of machine learning in earth sciences include geological mapping, gas leakage detection and geological features identification. Machine learning (ML) is a type of artificial intelligence (AI) that enables computer systems to classify, cluster, identify and analyze vast and complex sets of data while eliminating the need for explicit instructions and programming. Earth science is the study of the origin, evolution, and future of the planet Earth. The Earth system can be subdivided into four major components including the solid earth, atmosphere, hydrosphere and biosphere.

References

  1. 1 2 3 Xiao, Han; Rasul, Kashif; Vollgraf, Roland (2017-09-15). "Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms". arXiv: 1708.07747 [cs.LG].
  2. Shenwai, Tanushree (2021-09-07). "A New Google AI Research Study Discovers Anomalous Data Using Self Supervised Learning". MarkTechPost. Retrieved 2021-10-07.
  3. "Fashion-MNIST: Year In Review · Han Xiao Tech Blog - Neural Search & AI Engineering". hanxiao.io. Retrieved 2022-01-30.
  4. "Basic classification: Classify images of clothing | TensorFlow Core". TensorFlow. Retrieved 2021-10-07.
  5. "Build software better, together". GitHub. Retrieved 2022-01-30.
  6. "Papers using Fashion-MNIST (till 09.18)". Google Docs. Retrieved 2022-01-30.
  7. Meshkini, Khatereh; Platos, Jan; Ghassemain, Hassan (2020). Kovalev, Sergey; Tarassov, Valery; Snasel, Vaclav; Sukhanov, Andrey (eds.). "An Analysis of Convolutional Neural Network for Fashion Images Classification (Fashion-MNIST)". Proceedings of the Fourth International Scientific Conference "Intelligent Information Technologies for Industry" (IITI'19). Advances in Intelligent Systems and Computing. Cham: Springer International Publishing. 1156: 85–95. doi:10.1007/978-3-030-50097-9_10. ISBN   978-3-030-50097-9. S2CID   226778948.
  8. Kayed, Mohammed; Anter, Ahmed; Mohamed, Hadeer (February 2020). "Classification of Garments from Fashion MNIST Dataset Using CNN LeNet-5 Architecture". 2020 International Conference on Innovative Trends in Communication and Computer Engineering (ITCE). pp. 238–243. doi:10.1109/ITCE48509.2020.9047776. ISBN   978-1-7281-4801-4. S2CID   214691687.
  9. Bhatnagar, Shobhit; Ghosal, Deepanway; Kolekar, Maheshkumar H. (December 2017). "Classification of fashion article images using convolutional neural networks". 2017 Fourth International Conference on Image Information Processing (ICIIP). pp. 1–6. doi:10.1109/ICIIP.2017.8313740. ISBN   978-1-5090-6733-6. S2CID   3888338.
  10. Kadam, Shivam S.; Adamuthe, Amol C.; Patil, Ashwini B. (2020). "CNN Model for Image Classification on MNIST and Fashion-MNIST Dataset" (PDF). Journal of Scientific Research. 64 (2): 374–384. doi:10.37398/JSR.2020.640251. S2CID   226435631.
  11. Tanveer, Muhammad Suhaib; Khan, Muhammad Umar Karim; Kyung, Chong-Min (2020-06-16). "Fine-Tuning DARTS for Image Classification". arXiv: 2006.09042 [cs.CV].
  12. "Papers with Code - Fashion-MNIST Benchmark (Image Classification)". paperswithcode.com. Retrieved 2022-01-30.
  13. Lin, Xing; Rivenson, Yair; Yardimci, Nezih T.; Veli, Muhammed; Luo, Yi; Jarrahi, Mona; Ozcan, Aydogan (2018-09-07). "All-optical machine learning using diffractive deep neural networks". Science. 361 (6406): 1004–1008. arXiv: 1804.08711 . Bibcode:2018Sci...361.1004L. doi:10.1126/science.aat8084. ISSN   0036-8075. PMID   30049787. S2CID   13753997.