Meta-learning (computer science)

Last updated June 22, 2024

Meta-learning^[1]^[2] is a subfield of machine learning where automatic learning algorithms are applied to metadata about machine learning experiments. As of 2017, the term had not found a standard interpretation, however the main goal is to use such metadata to understand how automatic learning can become flexible in solving learning problems, hence to improve the performance of existing learning algorithms or to learn (induce) the learning algorithm itself, hence the alternative term learning to learn.^[1]

Flexibility is important because each learning algorithm is based on a set of assumptions about the data, its inductive bias.^[3] This means that it will only learn well if the bias matches the learning problem. A learning algorithm may perform very well in one domain, but not on the next. This poses strong restrictions on the use of machine learning or data mining techniques, since the relationship between the learning problem (often some kind of database) and the effectiveness of different learning algorithms is not yet understood.

By using different kinds of metadata, like properties of the learning problem, algorithm properties (like performance measures), or patterns previously derived from the data, it is possible to learn, select, alter or combine different learning algorithms to effectively solve a given learning problem. Critiques of meta-learning approaches bear a strong resemblance to the critique of metaheuristic, a possibly related problem. A good analogy to meta-learning, and the inspiration for Jürgen Schmidhuber's early work (1987)^[1] and Yoshua Bengio et al.'s work (1991),^[4] considers that genetic evolution learns the learning procedure encoded in genes and executed in each individual's brain. In an open-ended hierarchical meta-learning system^[1] using genetic programming, better evolutionary methods can be learned by meta evolution, which itself can be improved by meta meta evolution, etc.^[1]

Definition

A proposed definition^[5] for a meta-learning system combines three requirements:

The system must include a learning subsystem.
Experience is gained by exploiting meta knowledge extracted
- in a previous learning episode on a single dataset, or
- from different domains.
Learning bias must be chosen dynamically.

Bias refers to the assumptions that influence the choice of explanatory hypotheses^[6] and not the notion of bias represented in the bias-variance dilemma. Meta-learning is concerned with two aspects of learning bias.

Declarative bias specifies the representation of the space of hypotheses, and affects the size of the search space (e.g., represent hypotheses using linear functions only).
Procedural bias imposes constraints on the ordering of the inductive hypotheses (e.g., preferring smaller hypotheses).^[7]

Common approaches

There are three common approaches:^[8]

using (cyclic) networks with external or internal memory (model-based)
learning effective distance metrics (metrics-based)
explicitly optimizing model parameters for fast learning (optimization-based).

Model-Based

Model-based meta-learning models updates its parameters rapidly with a few training steps, which can be achieved by its internal architecture or controlled by another meta-learner model.^[8]

Memory-Augmented Neural Networks

A Memory-Augmented Neural Network, or MANN for short, is claimed to be able to encode new information quickly and thus to adapt to new tasks after only a few examples.^[9]

Meta Networks

Meta Networks (MetaNet) learns a meta-level knowledge across tasks and shifts its inductive biases via fast parameterization for rapid generalization.^[10]

Metric-Based

The core idea in metric-based meta-learning is similar to nearest neighbors algorithms, which weight is generated by a kernel function. It aims to learn a metric or distance function over objects. The notion of a good metric is problem-dependent. It should represent the relationship between inputs in the task space and facilitate problem solving.^[8]

Convolutional Siamese Neural Network

Siamese neural network is composed of two twin networks whose output is jointly trained. There is a function above to learn the relationship between input data sample pairs. The two networks are the same, sharing the same weight and network parameters.^[11]

Matching Networks

Matching Networks learn a network that maps a small labelled support set and an unlabelled example to its label, obviating the need for fine-tuning to adapt to new class types.^[12]

Relation Network

The Relation Network (RN), is trained end-to-end from scratch. During meta-learning, it learns to learn a deep distance metric to compare a small number of images within episodes, each of which is designed to simulate the few-shot setting.^[13]

Prototypical Networks

Prototypical Networks learn a metric space in which classification can be performed by computing distances to prototype representations of each class. Compared to recent approaches for few-shot learning, they reflect a simpler inductive bias that is beneficial in this limited-data regime, and achieve satisfied results.^[14]

Optimization-Based

What optimization-based meta-learning algorithms intend for is to adjust the optimization algorithm so that the model can be good at learning with a few examples.^[8]

LSTM Meta-Learner

LSTM-based meta-learner is to learn the exact optimization algorithm used to train another learner neural network classifier in the few-shot regime. The parametrization allows it to learn appropriate parameter updates specifically for the scenario where a set amount of updates will be made, while also learning a general initialization of the learner (classifier) network that allows for quick convergence of training.^[15]

Temporal Discreteness

Model-Agnostic Meta-Learning (MAML) is a fairly general optimization algorithm, compatible with any model that learns through gradient descent.^[16]

Reptile

Reptile is a remarkably simple meta-learning optimization algorithm, given that both of its components rely on meta-optimization through gradient descent and both are model-agnostic.^[17]

Examples

Some approaches which have been viewed as instances of meta-learning:

Recurrent neural networks (RNNs) are universal computers. In 1993, Jürgen Schmidhuber showed how "self-referential" RNNs can in principle learn by backpropagation to run their own weight change algorithm, which may be quite different from backpropagation.^[18] In 2001, Sepp Hochreiter & A.S. Younger & P.R. Conwell built a successful supervised meta-learner based on Long short-term memory RNNs. It learned through backpropagation a learning algorithm for quadratic functions that is much faster than backpropagation.^[19]^[2] Researchers at Deepmind (Marcin Andrychowicz et al.) extended this approach to optimization in 2017.^[20]
In the 1990s, Meta Reinforcement Learning or Meta RL was achieved in Schmidhuber's research group through self-modifying policies written in a universal programming language that contains special instructions for changing the policy itself. There is a single lifelong trial. The goal of the RL agent is to maximize reward. It learns to accelerate reward intake by continually improving its own learning algorithm which is part of the "self-referential" policy.^[21]^[22]
An extreme type of Meta Reinforcement Learning is embodied by the Gödel machine, a theoretical construct which can inspect and modify any part of its own software which also contains a general theorem prover. It can achieve recursive self-improvement in a provably optimal way.^[23]^[2]
Model-Agnostic Meta-Learning (MAML) was introduced in 2017 by Chelsea Finn et al.^[16] Given a sequence of tasks, the parameters of a given model are trained such that few iterations of gradient descent with few training data from a new task will lead to good generalization performance on that task. MAML "trains the model to be easy to fine-tune."^[16] MAML was successfully applied to few-shot image classification benchmarks and to policy-gradient-based reinforcement learning.^[16]
Variational Bayes-Adaptive Deep RL (VariBAD) was introduced in 2019.^[24] While MAML is optimization-based, VariBAD is a model-based method for meta reinforcement learning, and leverages a variational autoencoder to capture the task information in an internal memory, thus conditioning its decision making on the task.
When addressing a set of tasks, most meta learning approaches optimize the average score across all tasks. Hence, certain tasks may be sacrificed in favor of the average score, which is often unacceptable in real-world applications. By contrast, Robust Meta Reinforcement Learning (RoML) focuses on improving low-score tasks, increasing robustness to the selection of task.^[25] RoML works as a meta-algorithm, as it can be applied on top of other meta learning algorithms (such as MAML and VariBAD) to increase their robustness. It is applicable to both supervised meta learning and meta reinforcement learning.
Discovering meta-knowledge works by inducing knowledge (e.g. rules) that expresses how each learning method will perform on different learning problems. The metadata is formed by characteristics of the data (general, statistical, information-theoretic,... ) in the learning problem, and characteristics of the learning algorithm (type, parameter settings, performance measures,...). Another learning algorithm then learns how the data characteristics relate to the algorithm characteristics. Given a new learning problem, the data characteristics are measured, and the performance of different learning algorithms are predicted. Hence, one can predict the algorithms best suited for the new problem.
Stacked generalisation works by combining multiple (different) learning algorithms. The metadata is formed by the predictions of those different algorithms. Another learning algorithm learns from this metadata to predict which combinations of algorithms give generally good results. Given a new learning problem, the predictions of the selected set of algorithms are combined (e.g. by (weighted) voting) to provide the final prediction. Since each algorithm is deemed to work on a subset of problems, a combination is hoped to be more flexible and able to make good predictions.
Boosting is related to stacked generalisation, but uses the same algorithm multiple times, where the examples in the training data get different weights over each run. This yields different predictions, each focused on rightly predicting a subset of the data, and combining those predictions leads to better (but more expensive) results.
Dynamic bias selection works by altering the inductive bias of a learning algorithm to match the given problem. This is done by altering key aspects of the learning algorithm, such as the hypothesis representation, heuristic formulae, or parameters. Many different approaches exist.
Inductive transfer studies how the learning process can be improved over time. Metadata consists of knowledge about previous learning episodes and is used to efficiently develop an effective hypothesis for a new task. A related approach is called learning to learn, in which the goal is to use acquired knowledge from one domain to help learning in other domains.
Other approaches using metadata to improve automatic learning are learning classifier systems, case-based reasoning and constraint satisfaction.
Some initial, theoretical work has been initiated to use Applied Behavioral Analysis as a foundation for agent-mediated meta-learning about the performances of human learners, and adjust the instructional course of an artificial agent.^[26]
AutoML such as Google Brain's "AI building AI" project, which according to Google briefly exceeded existing ImageNet benchmarks in 2017.^[27]^[28]

Related Research Articles

<span class="mw-page-title-main">Supervised learning</span> A paradigm in machine learning

Supervised learning (SL) is a paradigm in machine learning where input objects and a desired output value train a model. The training data is processed, building a function that maps new data on expected output values. An optimal scenario will allow for the algorithm to correctly determine output values for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way. This statistical quality of an algorithm is measured through the so-called generalization error.

In machine learning, a neural network is a model inspired by the structure and function of biological neural networks in animal brains.

The inductive bias of a learning algorithm is the set of assumptions that the learner uses to predict outputs of given inputs that it has not encountered. Inductive bias is anything which makes the algorithm learn one pattern instead of another pattern. Learning is the process of apprehending useful knowledge by observing and interacting with the world. It involves searching a space of solutions for one expected to provide a better explanation of the data or to achieve higher rewards. But in many cases, there are multiple solutions which are equally good. An inductive bias allows a learning algorithm to prioritize one solution over another, independent of the observed data.

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data, and thus perform tasks without explicit instructions. Recently, artificial neural networks have been able to surpass many previous approaches in performance.

Jürgen Schmidhuber is a German computer scientist noted for his work in the field of artificial intelligence, specifically artificial neural networks. He is a scientific director of the Dalle Molle Institute for Artificial Intelligence Research in Switzerland. He is also director of the Artificial Intelligence Initiative and professor of the Computer Science program in the Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE) division at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia.

In machine learning, backpropagation is a gradient estimation method used to train neural network models. The gradient estimate is used by the optimization algorithm to compute the network parameter updates.

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to the uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. The term "recurrent neural network" is used to refer to the class of networks with an infinite impulse response, whereas "convolutional neural network" refers to the class of finite impulse response. Both classes of networks exhibit temporal dynamic behavior. A finite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that cannot be unrolled.

Long short-term memory (LSTM) is a type of recurrent neural network (RNN) aimed at dealing with the vanishing gradient problem present in traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps, thus "long short-term memory". It is applicable to classification, processing and predicting data based on time series, such as in handwriting, speech recognition, machine translation, speech activity detection, robot control, video games, and healthcare.

There are many types of artificial neural networks (ANN).

Josef "Sepp" Hochreiter is a German computer scientist. Since 2018 he has led the Institute for Machine Learning at the Johannes Kepler University of Linz after having led the Institute of Bioinformatics from 2006 to 2018. In 2017 he became the head of the Linz Institute of Technology (LIT) AI Lab. Hochreiter is also a founding director of the Institute of Advanced Research in Artificial Intelligence (IARAI). Previously, he was at the Technical University of Berlin, at University of Colorado Boulder, and at the Technical University of Munich. He is a chair of the Critical Assessment of Massive Data Analysis (CAMDA) conference.

In machine learning, a hyperparameter is a parameter, such as the learning rate or choice of optimizer, which specifies details of the learning process, hence the name hyperparameter. This is in contrast to parameters which determine the model itself.

Deep learning is the subset of machine learning methods based on neural networks with representation learning. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

In machine learning, the vanishing gradient problem is encountered when training neural networks with gradient-based learning methods and backpropagation. In such methods, during each iteration of training each of the neural networks weights receives an update proportional to the partial derivative of the error function with respect to the current weight. The problem is that as the sequence length increases, the gradient magnitude typically is expected to decrease, slowing the training process. In the worst case, this may completely stop the neural network from further training. As one example of the problem cause, traditional activation functions such as the hyperbolic tangent function have gradients in the range $[-1,1]$ , and backpropagation computes gradients by the chain rule. This has the effect of multiplying $n$ of these small numbers to compute gradients of the early layers in an $n$ -layer network, meaning that the gradient decreases exponentially with $n$ while the early layers train very slowly.

In artificial intelligence, a differentiable neural computer (DNC) is a memory augmented neural network architecture (MANN), which is typically recurrent in its implementation. The model was published in 2016 by Alex Graves et al. of DeepMind.

The following outline is provided as an overview of and topical guide to machine learning:

In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process.

In machine learning, the Highway Network was the first working very deep feedforward neural network with hundreds of layers, much deeper than previous artificial neural networks. It uses skip connections modulated by learned gating mechanisms to regulate information flow, inspired by Long Short-Term Memory (LSTM) recurrent neural networks. The advantage of a Highway Network over the common deep neural networks is that it solves or partially prevents the vanishing gradient problem, thus leading to easier to optimize neural networks. The gating mechanisms facilitate information flow across many layers.

Automated machine learning (AutoML) is the process of automating the tasks of applying machine learning to real-world problems. It is the combination of automation and ML.

<span class="mw-page-title-main">Residual neural network</span> Deep learning method

A residual neural network is a seminal deep learning model in which the weight layers learn residual functions with reference to the layer inputs. It was developed in 2015 for image recognition and won that year's ImageNet Large Scale Visual Recognition Challenge.

Artificial neural networks (ANNs) are models created using machine learning to perform a number of tasks. Their creation was inspired by neural circuitry. While some of the computational implementations ANNs relate to earlier discoveries in mathematics, the first implementation of ANNs was by psychologist Frank Rosenblatt, who developed the perceptron. Little research was conducted on ANNs in the 1970s and 1980s, with the AAAI calling that period an "AI winter".

References

1 2 3 4 5 Schmidhuber, Jürgen (1987). "Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-... hook" (PDF). Diploma Thesis, Tech. Univ. Munich.
1 2 3 Schaul, Tom; Schmidhuber, Jürgen (2010). "Metalearning". Scholarpedia. 5 (6): 4650. Bibcode:2010SchpJ...5.4650S. doi: 10.4249/scholarpedia.4650 .
↑ P. E. Utgoff (1986). "Shift of bias for inductive concept learning". In R. Michalski; J. Carbonell; T. Mitchell (eds.). Machine Learning: An Artificial Intelligence Approach. Morgan Kaufmann. pp. 163–190. ISBN 978-0-934613-00-2.
↑ Bengio, Yoshua; Bengio, Samy; Cloutier, Jocelyn (1991). Learning to learn a synaptic rule (PDF). IJCNN'91.
↑ Lemke, Christiane; Budka, Marcin; Gabrys, Bogdan (2013-07-20). "Metalearning: a survey of trends and technologies". Artificial Intelligence Review. 44 (1): 117–130. doi:10.1007/s10462-013-9406-y. ISSN 0269-2821. PMC 4459543 . PMID 26069389.
↑ Brazdil, Pavel; Carrier, Christophe Giraud; Soares, Carlos; Vilalta, Ricardo (2009). Metalearning - Springer. Cognitive Technologies. doi:10.1007/978-3-540-73263-1. ISBN 978-3-540-73262-4.
↑ Gordon, Diana; Desjardins, Marie (1995). "Evaluation and Selection of Biases in Machine Learning" (PDF). Machine Learning. 20: 5–22. doi: 10.1023/A:1022630017346 . Retrieved 27 March 2020.
1 2 3 4 Weng, Lilian (30 November 2018). "Meta-Learning: Learning to Learn Fast". OpenAI Blog. Retrieved 27 October 2019.
↑ Santoro, Adam; Bartunov, Sergey; Wierstra, Daan; Lillicrap, Timothy. "Meta-Learning with Memory-Augmented Neural Networks" (PDF). Google DeepMind. Retrieved 29 October 2019.
↑ Munkhdalai, Tsendsuren; Yu, Hong (2017). "Meta Networks". arXiv: 1703.00837 [cs.LG].
↑ Koch, Gregory; Zemel, Richard; Salakhutdinov, Ruslan (2015). "Siamese Neural Networks for One-shot Image Recognition" (PDF). Toronto, Ontario, Canada: Department of Computer Science, University of Toronto.
↑ Vinyals, O.; Blundell, C.; Lillicrap, T.; Kavukcuoglu, K.; Wierstra, D. (2016). "Matching networks for one shot learning" (PDF). Google DeepMind. Retrieved 3 November 2019.
↑ Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P. H. S.; Hospedales, T. M. (2018). "Learning to compare: relation network for few-shot learning" (PDF).
↑ Snell, J.; Swersky, K.; Zemel, R. S. (2017). "Prototypical networks for few-shot learning" (PDF).
↑ Ravi, Sachin; Larochelle, Hugo (2017). Optimization as a model for few-shot learning. ICLR 2017. Retrieved 3 November 2019.
1 2 3 4 Finn, Chelsea; Abbeel, Pieter; Levine, Sergey (2017). "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks". arXiv: 1703.03400 [cs.LG].
↑ Nichol, Alex; Achiam, Joshua; Schulman, John (2018). "On First-Order Meta-Learning Algorithms". arXiv: 1803.02999 [cs.LG].
↑ Schmidhuber, Jürgen (1993). "A self-referential weight matrix". Proceedings of ICANN'93, Amsterdam: 446–451.
↑ Hochreiter, Sepp; Younger, A. S.; Conwell, P. R. (2001). "Learning to Learn Using Gradient Descent". Proceedings of ICANN'01: 87–94.
↑ Andrychowicz, Marcin; Denil, Misha; Gomez, Sergio; Hoffmann, Matthew; Pfau, David; Schaul, Tom; Shillingford, Brendan; de Freitas, Nando (2017). "Learning to learn by gradient descent by gradient descent". Proceedings of ICML'17, Sydney, Australia. arXiv: 1606.04474 .
↑ Schmidhuber, Jürgen (1994). "On learning how to learn learning strategies" (PDF). Technical Report FKI-198-94, Tech. Univ. Munich.
↑ Schmidhuber, Jürgen; Zhao, J.; Wiering, M. (1997). "Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement". Machine Learning. 28: 105–130. doi: 10.1023/a:1007383707642 .
↑ Schmidhuber, Jürgen (2006). "Gödel machines: Fully Self-Referential Optimal Universal Self-Improvers". In B. Goertzel & C. Pennachin, Eds.: Artificial General Intelligence: 199–226.
↑ Zintgraf, Luisa; Schulze, Sebastian; Lu, Cong; Feng, Leo; Igl, Maximilian; Shiarlis, Kyriacos; Gal, Yarin; Hofmann, Katja; Whiteson, Shimon (2021). "VariBAD: Variational Bayes-Adaptive Deep RL via Meta-Learning". Journal of Machine Learning Research. 22 (289): 1–39. ISSN 1533-7928.
↑ Greenberg, Ido; Mannor, Shie; Chechik, Gal; Meirom, Eli (2023-12-15). "Train Hard, Fight Easy: Robust Meta Reinforcement Learning". Advances in Neural Information Processing Systems. 36: 68276–68299.
↑ Begoli, Edmon (May 2014). "Procedural-Reasoning Architecture for Applied Behavior Analysis-based Instructions". Doctoral Dissertations. Knoxville, Tennessee, USA: University of Tennessee, Knoxville: 44–79. Retrieved 14 October 2017.
↑ "Robots Are Now 'Creating New Robots,' Tech Reporter Says". NPR.org. 2018. Retrieved 29 March 2018.
↑ "AutoML for large scale image classification and object detection". Google Research Blog. November 2017. Retrieved 29 March 2018.

External links

Metalearning article in Scholarpedia
Vilalta, R.; Drissi, Y. (2002). "A perspective view and survey of meta-learning" (PDF). Artificial Intelligence Review. 18 (2): 77–95. doi:10.1023/A:1019956318069.
Giraud-Carrier, C.; Keller, J. (2002). "Meta-Learning". In Meij, J. (ed.). Dealing with the data flood. The Hague: STT/Beweton.
Brazdil, P.; Giraud-Carrier, C.; Soares, C.; Vilalta, R. (2009). "Metalearning: Concepts and Systems". Metalearning: applications to data mining. Springer. ISBN 978-3-540-73262-4.
Video courses about Meta-Learning with step-by-step explanation of MAML, Prototypical Networks, and Relation Networks.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[sch1987-1] 1 2 3 4 5 Schmidhuber, Jürgen (1987). "Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-... hook" (PDF). Diploma Thesis, Tech. Univ. Munich.

[scholarpedia-2] 1 2 3 Schaul, Tom; Schmidhuber, Jürgen (2010). "Metalearning". Scholarpedia. 5 (6): 4650. Bibcode:2010SchpJ...5.4650S. doi: 10.4249/scholarpedia.4650 .

[utgoff1986-3] P. E. Utgoff (1986). "Shift of bias for inductive concept learning". In R. Michalski; J. Carbonell; T. Mitchell (eds.). Machine Learning: An Artificial Intelligence Approach. Morgan Kaufmann. pp. 163–190. ISBN 978-0-934613-00-2.

[4] Bengio, Yoshua; Bengio, Samy; Cloutier, Jocelyn (1991). Learning to learn a synaptic rule (PDF). IJCNN'91.

[5] Lemke, Christiane; Budka, Marcin; Gabrys, Bogdan (2013-07-20). "Metalearning: a survey of trends and technologies". Artificial Intelligence Review. 44 (1): 117–130. doi:10.1007/s10462-013-9406-y. ISSN 0269-2821. PMC 4459543 . PMID 26069389.

[6] Brazdil, Pavel; Carrier, Christophe Giraud; Soares, Carlos; Vilalta, Ricardo (2009). Metalearning - Springer. Cognitive Technologies. doi:10.1007/978-3-540-73263-1. ISBN 978-3-540-73262-4.

[7] Gordon, Diana; Desjardins, Marie (1995). "Evaluation and Selection of Biases in Machine Learning" (PDF). Machine Learning. 20: 5–22. doi: 10.1023/A:1022630017346 . Retrieved 27 March 2020.

[paper1-8] 1 2 3 4 Weng, Lilian (30 November 2018). "Meta-Learning: Learning to Learn Fast". OpenAI Blog. Retrieved 27 October 2019.

[paper2-9] Santoro, Adam; Bartunov, Sergey; Wierstra, Daan; Lillicrap, Timothy. "Meta-Learning with Memory-Augmented Neural Networks" (PDF). Google DeepMind. Retrieved 29 October 2019.

[paper3-10] Munkhdalai, Tsendsuren; Yu, Hong (2017). "Meta Networks". arXiv: 1703.00837 [cs.LG].

[paper4-11] Koch, Gregory; Zemel, Richard; Salakhutdinov, Ruslan (2015). "Siamese Neural Networks for One-shot Image Recognition" (PDF). Toronto, Ontario, Canada: Department of Computer Science, University of Toronto.

[paper5-12] Vinyals, O.; Blundell, C.; Lillicrap, T.; Kavukcuoglu, K.; Wierstra, D. (2016). "Matching networks for one shot learning" (PDF). Google DeepMind. Retrieved 3 November 2019.

[paper6-13] Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P. H. S.; Hospedales, T. M. (2018). "Learning to compare: relation network for few-shot learning" (PDF).

[paper7-14] Snell, J.; Swersky, K.; Zemel, R. S. (2017). "Prototypical networks for few-shot learning" (PDF).

[paper8-15] Ravi, Sachin; Larochelle, Hugo (2017). Optimization as a model for few-shot learning. ICLR 2017. Retrieved 3 November 2019.

[maml-16] 1 2 3 4 Finn, Chelsea; Abbeel, Pieter; Levine, Sergey (2017). "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks". arXiv: 1703.03400 [cs.LG].

[paper10-17] Nichol, Alex; Achiam, Joshua; Schulman, John (2018). "On First-Order Meta-Learning Algorithms". arXiv: 1803.02999 [cs.LG].

[sch1993-18] Schmidhuber, Jürgen (1993). "A self-referential weight matrix". Proceedings of ICANN'93, Amsterdam: 446–451.

[hoch2001-19] Hochreiter, Sepp; Younger, A. S.; Conwell, P. R. (2001). "Learning to Learn Using Gradient Descent". Proceedings of ICANN'01: 87–94.

[marcin2017-20] Andrychowicz, Marcin; Denil, Misha; Gomez, Sergio; Hoffmann, Matthew; Pfau, David; Schaul, Tom; Shillingford, Brendan; de Freitas, Nando (2017). "Learning to learn by gradient descent by gradient descent". Proceedings of ICML'17, Sydney, Australia. arXiv: 1606.04474 .

[sch1994-21] Schmidhuber, Jürgen (1994). "On learning how to learn learning strategies" (PDF). Technical Report FKI-198-94, Tech. Univ. Munich.

[sch1997-22] Schmidhuber, Jürgen; Zhao, J.; Wiering, M. (1997). "Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement". Machine Learning. 28: 105–130. doi: 10.1023/a:1007383707642 .

[goedelmachine-23] Schmidhuber, Jürgen (2006). "Gödel machines: Fully Self-Referential Optimal Universal Self-Improvers". In B. Goertzel & C. Pennachin, Eds.: Artificial General Intelligence: 199–226.

[24] Zintgraf, Luisa; Schulze, Sebastian; Lu, Cong; Feng, Leo; Igl, Maximilian; Shiarlis, Kyriacos; Gal, Yarin; Hofmann, Katja; Whiteson, Shimon (2021). "VariBAD: Variational Bayes-Adaptive Deep RL via Meta-Learning". Journal of Machine Learning Research. 22 (289): 1–39. ISSN 1533-7928.

[25] Greenberg, Ido; Mannor, Shie; Chechik, Gal; Meirom, Eli (2023-12-15). "Train Hard, Fight Easy: Robust Meta Reinforcement Learning". Advances in Neural Information Processing Systems. 36: 68276–68299.

[Begoli,_PRS-ABA,_ABA_Ontology-26] Begoli, Edmon (May 2014). "Procedural-Reasoning Architecture for Applied Behavior Analysis-based Instructions". Doctoral Dissertations. Knoxville, Tennessee, USA: University of Tennessee, Knoxville: 44–79. Retrieved 14 October 2017.

[27] "Robots Are Now 'Creating New Robots,' Tech Reporter Says". NPR.org. 2018. Retrieved 29 March 2018.

[28] "AutoML for large scale image classification and object detection". Google Research Blog. November 2017. Retrieved 29 March 2018.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]