Transfer learning

Last updated
Illustration of transfer learning Transfer learning.svg
Illustration of transfer learning

Transfer learning (TL) is a technique in machine learning (ML) in which knowledge learned from a task is re-used in order to boost performance on a related task. [1] For example, for image classification, knowledge gained while learning to recognize cars could be applied when trying to recognize trucks. This topic is related to the psychological literature on transfer of learning, although practical ties between the two fields are limited. Reusing/transferring information from previously learned tasks to new tasks has the potential to significantly improve learning efficiency. [2]

Contents

Since transfer learning makes use of training with multiple objective functions it is related to cost-sensitive machine learning and multi-objective optimization. [3]

History

In 1976, Bozinovski and Fulgosi published a paper addressing transfer learning in neural network training. [4] [5] The paper gives a mathematical and geometrical model of the topic. In 1981, a report considered the application of transfer learning to a dataset of images representing letters of computer terminals, experimentally demonstrating positive and negative transfer learning. [6]

In 1992, Lorien Pratt formulated the discriminability-based transfer (DBT) algorithm. [7]

By 1998, the field had advanced to include multi-task learning, [8] along with more formal theoretical foundations. [9] Influential publications on transfer learning include the book Learning to Learn in 1998, [10] a 2009 survey [11] and a 2019 survey. [12]

Ng said in his NIPS 2016 tutorial [13] [14] that TL would become the next driver of machine learning commercial success after supervised learning.

In the 2020 paper, "Rethinking Pre-Training and self-training", [15] Zoph et al. reported that pre-training can hurt accuracy, and advocate self-training instead.

Applications

Algorithms are available for transfer learning in Markov logic networks [16] and Bayesian networks. [17] Transfer learning has been applied to cancer subtype discovery, [18] building utilization, [19] [20] general game playing, [21] text classification, [22] [23] digit recognition, [24] medical imaging and spam filtering. [25]

In 2020, it was discovered that, due to their similar physical natures, transfer learning is possible between electromyographic (EMG) signals from the muscles and classifying the behaviors of electroencephalographic (EEG) brainwaves, from the gesture recognition domain to the mental state recognition domain. It was noted that this relationship worked in both directions, showing that electroencephalographic can likewise be used to classify EMG. [26] The experiments noted that the accuracy of neural networks and convolutional neural networks were improved [27] through transfer learning both prior to any learning (compared to standard random weight distribution) and at the end of the learning process (asymptote). That is, results are improved by exposure to another domain. Moreover, the end-user of a pre-trained model can change the structure of fully-connected layers to improve performance. [28]

Software

Transfer learning and domain adaptation Transfer learning and domain adaptation.png
Transfer learning and domain adaptation

Several compilations of transfer learning and domain adaptation algorithms have been implemented:

See also

Related Research Articles

<span class="mw-page-title-main">Neural network (machine learning)</span> Computational model used in machine learning, based on connected, hierarchical functions

In machine learning, a neural network is a model inspired by the structure and function of biological neural networks in animal brains.

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to the uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. The term "recurrent neural network" is used to refer to the class of networks with an infinite impulse response, whereas "convolutional neural network" refers to the class of finite impulse response. Both classes of networks exhibit temporal dynamic behavior. A finite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that can not be unrolled.

<span class="mw-page-title-main">Spiking neural network</span> Artificial neural network that mimics neurons

Spiking neural networks (SNNs) are artificial neural networks (ANN) that more closely mimic natural neural networks. In addition to neuronal and synaptic state, SNNs incorporate the concept of time into their operating model. The idea is that neurons in the SNN do not transmit information at each propagation cycle, but rather transmit information only when a membrane potential—an intrinsic quality of the neuron related to its membrane electrical charge—reaches a specific value, called the threshold. When the membrane potential reaches the threshold, the neuron fires, and generates a signal that travels to other neurons which, in turn, increase or decrease their potentials in response to this signal. A neuron model that fires at the moment of threshold crossing is also called a spiking neuron model.

<span class="mw-page-title-main">Long short-term memory</span> Artificial recurrent neural network architecture used in deep learning

Long short-term memory (LSTM) is a type of recurrent neural network (RNN) aimed at dealing with the vanishing gradient problem present in traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps, thus "long short-term memory". It is applicable to classification, processing and predicting data based on time series, such as in handwriting, speech recognition, machine translation, speech activity detection, robot control, video games, and healthcare.

<span class="mw-page-title-main">Object detection</span> Computer technology related to computer vision and image processing

Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class in digital images and videos. Well-researched domains of object detection include face detection and pedestrian detection. Object detection has applications in many areas of computer vision, including image retrieval and video surveillance.

<span class="mw-page-title-main">Deep learning</span> Branch of machine learning

Deep learning is the subset of machine learning methods based on neural networks with representation learning. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

<span class="mw-page-title-main">MNIST database</span> Database of handwritten digits

The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.

Convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns feature engineering by itself via filters optimization. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.

Adversarial machine learning is the study of the attacks on machine learning algorithms, and of the defenses against such attacks. A survey from May 2020 exposes the fact that practitioners report a dire need for better protecting machine learning systems in industrial applications.

<span class="mw-page-title-main">Domain adaptation</span> Field associated with machine learning and transfer learning

Domain adaptation is a field associated with machine learning and transfer learning. This scenario arises when we aim at learning a model from a source data distribution and applying that model on a different target data distribution. For instance, one of the tasks of the common spam filtering problem consists in adapting a model from one user to a new user who receives significantly different emails. Domain adaptation has also been shown to be beneficial to learning unrelated sources. Note that, when more than one source distribution is available the problem is referred to as multi-source domain adaptation.

An AI accelerator, deep learning processor, or neural processing unit (NPU) is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications include algorithms for robotics, Internet of Things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2024, a typical AI integrated circuit chip contains tens of billions of MOSFETs.

Data augmentation is a statistical technique which allows maximum likelihood estimation from incomplete data. Data augmentation has important applications in Bayesian analysis, and the technique is widely used in machine learning to reduce overfitting when training machine learning models, achieved by training models on several slightly-modified copies of existing data.

The CIFAR-10 dataset is a collection of images that are commonly used to train machine learning and computer vision algorithms. It is one of the most widely used datasets for machine learning research. The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.

Multi-task optimization is a paradigm in the optimization literature that focuses on solving multiple self-contained tasks simultaneously. The paradigm has been inspired by the well-established concepts of transfer learning and multi-task learning in predictive analytics.

Deep reinforcement learning is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the state space. Deep RL algorithms are able to take in very large inputs and decide what actions to perform to optimize an objective. Deep reinforcement learning has been used for a diverse set of applications including but not limited to robotics, video games, natural language processing, computer vision, education, transportation, finance and healthcare.

Artificial neural networks (ANNs) are models created using machine learning to perform a number of tasks. Their creation was inspired by neural circuitry. While some of the computational implementations ANNs relate to earlier discoveries in mathematics, the first implementation of ANNs was by psychologist Frank Rosenblatt, who developed the perceptron. Little research was conducted on ANNs in the 1970s and 1980s, with the AAAI calling that period an "AI winter".

Emotion recognition in conversation (ERC) is a sub-field of emotion recognition, that focuses on mining human emotions from conversations or dialogues having two or more interlocutors. The datasets in this field are usually derived from social platforms that allow free and plenty of samples, often containing multimodal data. Self- and inter-personal influences play critical role in identifying some basic emotions, such as, fear, anger, joy, surprise, etc. The more fine grained the emotion labels are the harder it is to detect the correct emotion. ERC poses a number of challenges, such as, conversational-context modeling, speaker-state modeling, presence of sarcasm in conversation, emotion shift across consecutive utterances of the same interlocutor.

Self-supervised learning (SSL) is a paradigm in machine learning where a model is trained on a task using the data itself to generate supervisory signals, rather than relying on external labels provided by humans. In the context of neural networks, self-supervised learning aims to leverage inherent structures or relationships within the input data to create meaningful training signals. SSL tasks are designed so that solving it requires capturing essential features or relationships in the data. The input data is typically augmented or transformed in a way that creates pairs of related samples. One sample serves as the input, and the other is used to formulate the supervisory signal. This augmentation can involve introducing noise, cropping, rotation, or other transformations. Self-supervised learning more closely imitates the way humans learn to classify objects.

References

  1. West, Jeremy; Ventura, Dan; Warnick, Sean (2007). "Spring Research Presentation: A Theoretical Foundation for Inductive Transfer". Brigham Young University, College of Physical and Mathematical Sciences. Archived from the original on 2007-08-01. Retrieved 2007-08-05.
  2. George Karimpanal, Thommen; Bouffanais, Roland (2019). "Self-organizing maps for storage and transfer of knowledge in reinforcement learning". Adaptive Behavior. 27 (2): 111–126. arXiv: 1811.08318 . doi:10.1177/1059712318818568. ISSN   1059-7123. S2CID   53774629.
  3. Cost-Sensitive Machine Learning. (2011). USA: CRC Press, Page 63, https://books.google.de/books?id=8TrNBQAAQBAJ&pg=PA63
  4. Stevo. Bozinovski and Ante Fulgosi (1976). "The influence of pattern similarity and transfer learning upon the training of a base perceptron B2." (original in Croatian) Proceedings of Symposium Informatica 3-121-5, Bled.
  5. Stevo Bozinovski (2020) "Reminder of the first paper on transfer learning in neural networks, 1976". Informatica 44: 291–302.
  6. S. Bozinovski (1981). "Teaching space: A representation concept for adaptive pattern classification." COINS Technical Report, the University of Massachusetts at Amherst, No 81-28 [available online: UM-CS-1981-028.pdf]
  7. Pratt, L. Y. (1992). "Discriminability-based transfer between neural networks" (PDF). NIPS Conference: Advances in Neural Information Processing Systems 5. Morgan Kaufmann Publishers. pp. 204–211.
  8. Caruana, R., "Multitask Learning", pp. 95-134 in Thrun & Pratt 2012
  9. Baxter, J., "Theoretical Models of Learning to Learn", pp. 71-95 Thrun & Pratt 2012
  10. Thrun & Pratt 2012.
  11. Pan, Sinno Jialin; Yang, Qiang (2009). "A Survey on Transfer Learning" (PDF). IEEE.
  12. Zhuang, Fuzhen; Qi, Zhiyuan; Duan, Keyu; Xi, Dongbo; Zhu, Yongchun; Zhu, Hengshu; Xiong, Hui; He, Qing (2019). "A Comprehensive Survey on Transfer Learning". IEEE. arXiv: 1911.02685 .
  13. NIPS 2016 tutorial: "Nuts and bolts of building AI applications using Deep Learning" by Andrew Ng, archived from the original on 2021-12-19, retrieved 2019-12-28
  14. Nuts and bolts of building AI applications using Deep Learning, slides
  15. Zoph, Barret (2020). "Rethinking pre-training and self-training" (PDF). Advances in Neural Information Processing Systems. 33: 3833–3845. arXiv: 2006.06882 . Retrieved 2022-12-20.
  16. Mihalkova, Lilyana; Huynh, Tuyen; Mooney, Raymond J. (July 2007), "Mapping and Revising Markov Logic Networks for Transfer" (PDF), Learning Proceedings of the 22nd AAAI Conference on Artificial Intelligence (AAAI-2007), Vancouver, BC, pp. 608–614, retrieved 2007-08-05{{citation}}: CS1 maint: location missing publisher (link)
  17. Niculescu-Mizil, Alexandru; Caruana, Rich (March 21–24, 2007), "Inductive Transfer for Bayesian Network Structure Learning" (PDF), Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS 2007), retrieved 2007-08-05
  18. Hajiramezanali, E. & Dadaneh, S. Z. & Karbalayghareh, A. & Zhou, Z. & Qian, X. Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada. arXiv : 1810.09433
  19. Arief-Ang, I.B.; Salim, F.D.; Hamilton, M. (2017-11-08). DA-HOC: semi-supervised domain adaptation for room occupancy prediction using CO2 sensor data. 4th ACM International Conference on Systems for Energy-Efficient Built Environments (BuildSys). Delft, Netherlands. pp. 1–10. doi:10.1145/3137133.3137146. ISBN   978-1-4503-5544-5.
  20. Arief-Ang, I.B.; Hamilton, M.; Salim, F.D. (2018-12-01). "A Scalable Room Occupancy Prediction with Transferable Time Series Decomposition of CO2 Sensor Data". ACM Transactions on Sensor Networks. 14 (3–4): 21:1–21:28. doi:10.1145/3217214. S2CID   54066723.
  21. Banerjee, Bikramjit, and Peter Stone. "General Game Learning Using Knowledge Transfer." IJCAI. 2007.
  22. Do, Chuong B.; Ng, Andrew Y. (2005). "Transfer learning for text classification". Neural Information Processing Systems Foundation, NIPS*2005 (PDF). Retrieved 2007-08-05.
  23. Rajat, Raina; Ng, Andrew Y.; Koller, Daphne (2006). "Constructing Informative Priors using Transfer Learning". Twenty-third International Conference on Machine Learning (PDF). Retrieved 2007-08-05.
  24. Maitra, D. S.; Bhattacharya, U.; Parui, S. K. (August 2015). "CNN based common approach to handwritten character recognition of multiple scripts". 2015 13th International Conference on Document Analysis and Recognition (ICDAR). pp. 1021–1025. doi:10.1109/ICDAR.2015.7333916. ISBN   978-1-4799-1805-8. S2CID   25739012.
  25. Bickel, Steffen (2006). "ECML-PKDD Discovery Challenge 2006 Overview". ECML-PKDD Discovery Challenge Workshop (PDF). Retrieved 2007-08-05.
  26. Bird, Jordan J.; Kobylarz, Jhonatan; Faria, Diego R.; Ekart, Aniko; Ribeiro, Eduardo P. (2020). "Cross-Domain MLP and CNN Transfer Learning for Biological Signal Processing: EEG and EMG". IEEE Access. 8. Institute of Electrical and Electronics Engineers (IEEE): 54789–54801. Bibcode:2020IEEEA...854789B. doi: 10.1109/access.2020.2979074 . ISSN   2169-3536.
  27. Maitra, Durjoy Sen; Bhattacharya, Ujjwal; Parui, Swapan K. (August 2015). "CNN based common approach to handwritten character recognition of multiple scripts". 2015 13th International Conference on Document Analysis and Recognition (ICDAR). pp. 1021–1025. doi:10.1109/ICDAR.2015.7333916. ISBN   978-1-4799-1805-8. S2CID   25739012.
  28. Kabir, H. M. Dipu; Abdar, Moloud; Jalali, Seyed Mohammad Jafar; Khosravi, Abbas; Atiya, Amir F.; Nahavandi, Saeid; Srinivasan, Dipti (January 7, 2022). "SpinalNet: Deep Neural Network with Gradual Input". IEEE Transactions on Artificial Intelligence. 4 (5): 1165–1177. arXiv: 2007.03347 . doi:10.1109/TAI.2022.3185179. S2CID   220381239.
  29. de Mathelin, Antoine and Deheeger, François and Richard, Guillaume and Mougeot, Mathilde and Vayatis, Nicolas (2020) "ADAPT: Awesome Domain Adaptation Python Toolbox"
  30. Mingsheng Long Junguang Jiang, Bo Fu. (2020) "Transfer-learning-library"
  31. Ke Yan. (2016) "Domain adaptation toolbox"

Sources