1-2-AX working memory task

Last updated
1-2-AX working memory task
PurposeWorking memory abilities of long short-term memory

The 1-2-AX working memory task is a cognitive test which requires working memory to be solved.

Contents

It can be used as a test case for learning algorithms to test their ability to remember some old data. This task can be used to demonstrate the working memory abilities of algorithms like PBWM or Long short-term memory. [1]

Description

The input of the task is a sequence of the numbers/letters 1, 2, A, X, B and Y, and additional distracting instances of 3, C and Z which should be ignored. For each character of input in sequence, the subject must respond with left (L) or right (R).

The two target sequences that the subject is looking for are A-X and B-Y. When the subject encounters a 1 they must switch to looking for A-X, and when they encounter a 2 they must switch to looking for B-Y.

While looking for A-X, if the subject encounters an X having seen an A previously (and similarly for a Y while looking for B-Y), and where that previous letter was not part of an earlier sequence, they respond R to mark the end of that sequence; their response to all other characters should be L. [2]

Examples

Input21AAXXYAX
OutputLLLLRLLLR
Input12ABXYACZ
OutputLLLLLRLLL

Requirements for algorithms

To solve this task, an algorithm must be able to both remember the last number 1 or 2 and the last letter A or B independently. We refer to this memory as the working memory. This memory must persist all other input. In addition, the algorithm must be able to strip out and ignore the letters C and Z.

Solutions

Pseudocode

For traditional computer models, both requirements are easy to solve. Here is some Python code (kind of pseudocode but works) where the function next_output gets one single number/letter as input and returns either a letter or nothing. next_outputs is there for convenience to operate on a whole sequence.

last_num=""last_letter=""defnext_output(next_input:str)->str|None:"""    Args:      next_input: A string containing a single character.    Returns:      A string containing the letters "L", "R" or None.    Example:      >>> next_output("2")      'L'    """globallast_num,last_letterifnext_inputin["1","2"]:last_num=next_inputlast_letter=""return"L"elifnext_inputin["A","B"]:last_letter=next_inputreturn"L"elifnext_inputin["X","Y"]:seq=last_num+last_letter+next_inputlast_letter=next_inputifseqin["1AX","2BY"]:return"R"return"L"returnNonedefnext_outputs(next_inputs:str)->list[str]:"""    Args:      next_input: A string.    Returns:      A list of strings containing the letters "L" or "R".    Example:      >>> next_outputs("21AAXBYAX")      ["L", "L", "L", "L", "R", "L", "L", "L", "R"]    """return[next_output(c)forcinnext_inputs]

Example:

>>> next_outputs("21AAXBYAX")['L', 'L', 'L', 'L', 'R', 'L', 'L', 'L', 'R']>>> next_outputs("12CBZY")['L', 'L', None, 'L', None, 'R']

Finite-state machine

Similarly, this task can be solved in a straightforward way by a finite-state machine with 7 states (call them ---, 1--, 2--, 1A-, 2B-, 1AX, 2BY).

Neural network

This task is much more difficult for neural networks. For simple feedforward neural networks, this task is not solvable because feedforward networks don't have any working memory. After all, including working memory into neural networks is a difficult task. There have been several approaches like PBWM or Long short-term memory which have working memory. This 1-2-AX task is good task for these models and both are able to solve the task.

Related Research Articles

<span class="mw-page-title-main">Supervised learning</span> A paradigm in machine learning

Supervised learning (SL) is a paradigm in machine learning where input objects and a desired output value train a model. The training data is processed, building a function that maps new data on expected output values. An optimal scenario will allow for the algorithm to correctly determine output values for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way. This statistical quality of an algorithm is measured through the so-called generalization error.

<span class="mw-page-title-main">Perceptron</span> Algorithm for supervised learning of binary classifiers

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

Instantaneously trained neural networks are feedforward artificial neural networks that create a new hidden neuron node for each novel training sample. The weights to this hidden neuron separate out not only this training sample but others that are near it, thus providing generalization. This separation is done using the nearest hyperplane that can be written down instantaneously. In the two most important implementations the neighborhood of generalization either varies with the training sample or remains constant. These networks use unary coding for an effective representation of the data sets.

<span class="mw-page-title-main">No free lunch in search and optimization</span> Average solution cost is the same with any method

In computational complexity and optimization the no free lunch theorem is a result that states that for certain types of mathematical problems, the computational cost of finding a solution, averaged over all problems in the class, is the same for any solution method. The name alludes to the saying "no such thing as a free lunch", that is, no method offers a "short cut". This is under the assumption that the search space is a probability density function. It does not apply to the case where the search space has underlying structure that can be exploited more efficiently than random search or even has closed-form solutions that can be determined without search at all. For such probabilistic assumptions, the outputs of all procedures solving a particular type of problem are statistically identical. A colourful way of describing such a circumstance, introduced by David Wolpert and William G. Macready in connection with the problems of search and optimization, is to say that there is no free lunch. Wolpert had previously derived no free lunch theorems for machine learning. Before Wolpert's article was published, Cullen Schaffer independently proved a restricted version of one of Wolpert's theorems and used it to critique the current state of machine learning research on the problem of induction.

<span class="mw-page-title-main">Recurrent neural network</span> Computational model used in machine learning

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. The term "recurrent neural network" is used to refer to the class of networks with an infinite impulse response, whereas "convolutional neural network" refers to the class of finite impulse response. Both classes of networks exhibit temporal dynamic behavior. A finite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that can not be unrolled.

<span class="mw-page-title-main">Feedforward neural network</span> One of two broad types of artificial neural network

A feedforward neural network (FNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. Its flow is uni-directional, meaning that the information in the model flows in only one direction—forward—from the input nodes, through the hidden nodes and to the output nodes, without any cycles or loops, in contrast to recurrent neural networks, which have a bi-directional flow. Modern feedforward networks are trained using the backpropagation method and are colloquially referred to as the "vanilla" neural networks.

<span class="mw-page-title-main">Neural network</span> Structure in biology and artificial intelligence

A neural network can refer to either a neural circuit of biological neurons, or a network of artificial neurons or nodes in the case of an artificial neural network. Artificial neural networks are used for solving artificial intelligence (AI) problems; they model connections of biological neurons as weights between nodes. A positive weight reflects an excitatory connection, while negative values mean inhibitory connections. All inputs are modified by a weight and summed. This activity is referred to as a linear combination. Finally, an activation function controls the amplitude of the output. For example, an acceptable range of output is usually between 0 and 1, or it could be −1 and 1.

Color BASIC is the implementation of Microsoft BASIC that is included in the ROM of the Tandy/Radio Shack TRS-80 Color Computers manufactured between 1980 and 1991. BASIC is a high level language with simple syntax that makes it easy to write simple programs. Color BASIC is interpreted, that is, decoded as it is run.

<span class="mw-page-title-main">Kernel method</span> Class of algorithms for pattern analysis

In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pattern analysis is to find and study general types of relations in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over all pairs of data points computed using inner products. The feature map in kernel machines is infinite dimensional but only requires a finite dimensional matrix from user-input according to the Representer theorem. Kernel machines are slow to compute for datasets larger than a couple of thousand examples without parallel processing.

<span class="mw-page-title-main">Quantum neural network</span> Quantum Mechanics in Neural Networks

Quantum neural networks are computational neural network models which are based on the principles of quantum mechanics. The first ideas on quantum neural computation were published independently in 1995 by Subhash Kak and Ron Chrisley, engaging with the theory of quantum mind, which posits that quantum effects play a role in cognitive function. However, typical research in quantum neural networks involves combining classical artificial neural network models with the advantages of quantum information in order to develop more efficient algorithms. One important motivation for these investigations is the difficulty to train classical neural networks, especially in big data applications. The hope is that features of quantum computing such as quantum parallelism or the effects of interference and entanglement can be used as resources. Since the technological implementation of a quantum computer is still in a premature stage, such quantum neural network models are mostly theoretical proposals that await their full implementation in physical experiments.

Leabra stands for local, error-driven and associative, biologically realistic algorithm. It is a model of learning which is a balance between Hebbian and error-driven learning with other network-derived characteristics. This model is used to mathematically predict outcomes based on inputs and previous learning influences. This model is heavily influenced by and contributes to neural network designs and models. This algorithm is the default algorithm in emergent when making a new project, and is extensively used in various simulations.

<span class="mw-page-title-main">Cerebellar model articulation controller</span>

The cerebellar model arithmetic computer (CMAC) is a type of neural network based on a model of the mammalian cerebellum. It is also known as the cerebellar model articulation controller. It is a type of associative memory.

<span class="mw-page-title-main">Echo state network</span> Type of reservoir computer

An echo state network (ESN) is a type of reservoir computer that uses a recurrent neural network with a sparsely connected hidden layer. The connectivity and weights of hidden neurons are fixed and randomly assigned. The weights of output neurons can be learned so that the network can produce or reproduce specific temporal patterns. The main interest of this network is that although its behaviour is non-linear, the only weights that are modified during training are for the synapses that connect the hidden neurons to output neurons. Thus, the error function is quadratic with respect to the parameter vector and can be differentiated easily to a linear system.

The generalized Hebbian algorithm (GHA), also known in the literature as Sanger's rule, is a linear feedforward neural network model for unsupervised learning with applications primarily in principal components analysis. First defined in 1989, it is similar to Oja's rule in its formulation and stability, except it can be applied to networks with multiple outputs. The name originates because of the similarity between the algorithm and a hypothesis made by Donald Hebb about the way in which synaptic strengths in the brain are modified in response to experience, i.e., that changes are proportional to the correlation between the firing of pre- and post-synaptic neurons.

Backpropagation through time (BPTT) is a gradient-based technique for training certain types of recurrent neural networks. It can be used to train Elman networks. The algorithm was independently derived by numerous researchers.

There are many types of artificial neural networks (ANN).

<span class="mw-page-title-main">Glossary of artificial intelligence</span> List of definitions of terms and concepts commonly used in the study of artificial intelligence

This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence, its sub-disciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.

In computer science, an enumeration algorithm is an algorithm that enumerates the answers to a computational problem. Formally, such an algorithm applies to problems that take an input and produce a list of solutions, similarly to function problems. For each input, the enumeration algorithm must produce the list of all solutions, without duplicates, and then halt. The performance of an enumeration algorithm is measured in terms of the time required to produce the solutions, either in terms of the total time required to produce all solutions, or in terms of the maximal delay between two consecutive solutions and in terms of a preprocessing time, counted as the time before outputting the first solution. This complexity can be expressed in terms of the size of the input, the size of each individual output, or the total size of the set of all outputs, similarly to what is done with output-sensitive algorithms.

An artificial neural network (ANN) combines biological principles with advanced statistics to solve problems in domains such as pattern recognition and game-play. ANNs adopt the basic model of neuron analogues connected to each other in a variety of ways.

<span class="mw-page-title-main">Transformer (machine learning model)</span> Machine learning algorithm used for natural-language processing

A transformer is a deep learning architecture that relies on the parallel multi-head attention mechanism. The modern transformer was proposed in the 2017 paper titled 'Attention Is All You Need' by Ashish Vaswani et al., Google Brain team. It is notable for requiring less training time than previous recurrent neural architectures, such as long short-term memory (LSTM), and its later variation has been prevalently adopted for training large language models on large (language) datasets, such as the Wikipedia corpus and Common Crawl, by virtue of the parallelized processing of input sequence. Input text is split into n-grams encoded as tokens and each token is converted into a vector via looking up from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism allowing the signal for key tokens to be amplified and less important tokens to be diminished. Though the transformer paper was published in 2017, the softmax-based attention mechanism was proposed earlier in 2014 by Bahdanau, Cho, and Bengio for machine translation, and the Fast Weight Controller, similar to a transformer, was proposed in 1992 by Schmidhuber.

References

  1. O'Reilly, R.C. & Frank, M.J (2006). "Making Working Memory Work: A Computational Model of Learning in the Frontal Cortex and Basal Ganglia. Neural". Neural Computation. 18 (2): 283–328. doi:10.1162/089976606775093909. PMID   16378516. S2CID   8912485 . Retrieved 2010-05-30.
  2. O'Reilly, Randall C.; Frank, Michael J. (1 February 2006). "Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia". Neural Computation. 18 (2): 283–328. doi:10.1162/089976606775093909. PMID   16378516. S2CID   8912485 . Retrieved 28 January 2023.