In psychology, multiple trace theory is a memory consolidation model advanced as an alternative model to strength theory. It posits that each time some information is presented to a person, it is neurally encoded in a unique memory trace composed of a combination of its attributes. [1] Further support for this theory came in the 1960s from empirical findings that people could remember specific attributes about an object without remembering the object itself. [2] The mode in which the information is presented and subsequently encoded can be flexibly incorporated into the model. This memory trace is unique from all others resembling it due to differences in some aspects of the item's attributes, and all memory traces incorporated since birth are combined into a multiple-trace representation in the brain. [3] In memory research, a mathematical formulation of this theory can successfully explain empirical phenomena observed in recognition and recall tasks.
The attributes an item possesses form its trace and can fall into many categories. When an item is committed to memory, information from each of these attributional categories is encoded into the item's trace. There may be a kind of semantic categorization at play, whereby an individual trace is incorporated into overarching concepts of an object. For example, when a person sees a pigeon, a trace is added to the "pigeon" cluster of traces within his or her mind. This new "pigeon" trace, while distinguishable and divisible from other instances of pigeons that the person may have seen within his or her life, serves to support the more general and overarching concept of a pigeon.
Physical attributes of an item encode information about physical properties of a presented item. For a word, this could include color, font, spelling, and size, while for a picture, the equivalent aspects could be shapes and colors of objects. It has been shown experimentally that people who are unable to recall an individual word can sometimes recall the first or last letter or even rhyming words, [4] all aspects encoded in the physical orthography of a word's trace. Even when an item is not presented visually, when encoded, it may have some physical aspects based on a visual representation of the item.
Contextual attributes are a broad class of attributes that define the internal and external features that are simultaneous with presentation of the item. Internal context is a sense of the internal network that a trace evokes. [5] This may range from aspects of an individual's mood to other semantic associations the presentation of the word evokes. On the other hand, external context encodes information about the spatial and temporal aspects as information is being presented. This may reflect time of day or weather, for example. Spatial attributes can refer both to physical environment and imagined environment. The method of loci, a mnemonic strategy incorporating an imagined spatial position, assigns relative spatial positions to different items memorized and then "walking through" these assigned positions to remember the items.
Modality attributes possess information as to the method by which an item was presented. The most frequent types of modalities in an experimental setting are auditory and visual. Any sensory modality may be utilized practically.
These attributes refer to the categorization of items presented. Items that fit into the same categories will have the same class attributes. For example, if the item "touchdown" were presented, it would evoke the overarching concept of "football" or perhaps, more generally, "sports", and it would likely share class attributes with "endzone" and other elements that fit into the same concept. A single item may fit into different concepts at the time it is presented depending on other attributes of the item, like context. For example, the word "star" might fall into the class of astronomy after visiting a space museum or a class with words like "celebrity" or "famous" after seeing a movie.
The mathematical formulation of traces allows for a model of memory as an ever-growing matrix that is continuously receiving and incorporating information in the form of a vectors of attributes. Multiple trace theory states that every item ever encoded, from birth to death, will exist in this matrix as multiple traces. This is done by giving every possible attribute some numerical value to classify it as it is encoded, so each encoded memory will have a unique set of numerical attributes.
By assigning numerical values to all possible attributes, it is convenient to construct a column vector representation of each encoded item. This vector representation can also be fed into computational models of the brain like neural networks, which take as inputs vectorial "memories" and simulate their biological encoding through neurons.
Formally, one can denote an encoded memory by numerical assignments to all of its possible attributes. If two items are perceived to have the same color or experienced in the same context, the numbers denoting their color and contextual attributes, respectively, will be relatively close. Suppose we encode a total of L attributes anytime we see an object. Then, when a memory is encoded, it can be written as m1 with L total numerical entries in a column vector:
A subset of the L attributes will be devoted to contextual attributes, a subset to physical attributes, and so on. One underlying assumption of multiple trace theory is that, when we construct multiple memories, we organize the attributes in the same order. Thus, we can similarly define vectors m2, m3, ..., mn to account for n total encoded memories. Multiple trace theory states that these memories come together in our brain to form a memory matrix from the simple concatenation of the individual memories:
For L total attributes and n total memories, M will have L rows and n columns. Note that, although the n traces are combined into a large memory matrix, each trace is individually accessible as a column in this matrix.
In this formulation, the n different memories are made to be more or less independent of each other. However, items presented in some setting together will become tangentially associated by the similarity of their context vectors. If multiple items are made associated with each other and intentionally encoded in that manner, say an item a and an item b, then the memory for these two can be constructed, with each having k attributes as follows:
When items are learned one after another, it is tempting to say that they are learned in the same temporal context. However, in reality, there are subtle variations in context. Hence, contextual attributes are often considered to be changing over time as modeled by a stochastic process. [6] Considering a vector of only r total context attributes ti that represents the context of memory mi, the context of the next-encoded memory is given by ti+1:
so,
Here, ε(j) is a random number sampled from a Gaussian distribution.
As explained in the subsequent section, the hallmark of multiple trace theory is an ability to compare some probe item to the pre-existing matrix of encoded memories. This simulates the memory search process, whereby we can determine whether we have ever seen the probe before as in recognition tasks or whether the probe gives rise to another previously encoded memory as in cued recall.
First, the probe p is encoded as an attribute vector. Continuing with the preceding example of the memory matrix M, the probe will have L entries:
This p is then compared one by one to all pre-existing memories (trace) in M by determining the Euclidean distance between p and each mi:
Due to the stochastic nature of context, it is almost never the case in multiple trace theory that a probe item exactly matches an encoded memory. Still, high similarity between p and mi is indicated by a small Euclidean distance. Hence, another operation must be performed on the distance that leads to very low similarity for great distance and very high similarity for small distance. A linear operation does not eliminate low-similarity items harshly enough. Intuitively, an exponential decay model seems most suitable:
where τ is a decay parameter that can be experimentally assigned. We can go on to then define similarity to the entire memory matrix by a summed similarity SS(p,M) between the probe p and the memory matrix M:
If the probe item is very similar to even one of the encoded memories, SS receives a large boost. For example, given m1 as a probe item, we will get a near 0 distance (not exactly due to context) for i=1, which will add nearly the maximal boost possible to SS. To differentiate from background similarity (there will always be some low similarity to context or a few attributes for example), SS is often compared to some arbitrary criterion. If it is higher than the criterion, then the probe is considered among those encoded. The criterion can be varied based on the nature of the task and the desire to prevent false alarms. Thus, multiple trace theory predicts that, given some cue, the brain can compare that cue to a criterion to answer questions like "has this cue been experienced before?" (recognition) or "what memory does this cue elicit?" (cued recall), which are applications of summed similarity described below.
Multiple trace theory fits well into the conceptual framework for recognition. Recognition requires an individual to determine whether or not they have seen an item before. For example, facial recognition is determining whether one has seen a face before. When asked this for a successfully encoded item (something that has indeed been seen before), recognition should occur with high probability. In the mathematical framework of this theory, we can model recognition of an individual probe item p by summed similarity with a criterion. We translate the test item into an attribute vector as done for the encoded memories and compared to every trace ever encountered. If summed similarity passes the criterion, we say we have seen the item before. Summed similarity is expected to be very low if the item has never been seen but relatively higher if it has due to the similarity of the probe's attributes to some memory of the memory matrix.
This can be applied both to individual item recognition and associative recognition for two or more items together.
The theory can also account for cued recall. Here, some cue is given that is meant to elicit an item out of memory. For example, a factual question like "Who was the first President of the United States?" is a cue to elicit the answer of "George Washington". In the "ab" framework described above, we can take all attributes present in a cue and list consider these the a item in an encoded association as we try to recall the b portion of the mab memory. In this example, attributes like "first", "President", and "United States" will be combined to form the a vector, which will have already been formulated into the mab memory whose b values encode "George Washington". Given a, there are two popular models for how we can successfully recall b:
1) We can go through and determine similarity (not summed similarity, see above for distinction) to every item in memory for the a attributes, then pick whichever memory has the highest similarity for the a. Whatever b-type attributes we are linked to gives what we recall. The mab memory gives best chance of recall since its a elements will have high similarity to the cue a. Still, since recall does not always occur, we can say that the similarity must pass a criterion for recall to occur at all. This is similar to how the IBM machine Watson operates. Here, the similarity compares only the a-type attributes of a to mab.
2) We can use a probabilistic choice rule to determine probability of recalling an item as proportional to its similarity. This is akin to throwing a dart at a dartboard with bigger areas represented by larger similarities to the cue item. Mathematically speaking, given the cue a, the probability of recalling the desired memory mab is:
In computing both similarity and summed similarity, we only consider relations among a-type attributes. We add the error term because without it, the probability of recalling any memory in M will be 1, but there are certainly times when recall does not occur at all.
Phenomena in memory associated with repetition, word frequency, recency, forgetting, and contiguity, among others, can be easily explained in the realm of multiple trace theory. Memory is known to improve with repeated exposure to items. For example, hearing a word several times in a list will improve recognition and recall of that word later on. This is because repeated exposure simply adds the memory into the ever-growing memory matrix, so summed similarity for this memory will be larger and thus more likely to pass the criterion.
In recognition, very common words are harder to recognize as part of a memorized list, when tested, than rare words. This is known as the word frequency effect and can be explained by multiple trace theory as well. For common words, summed similarity will be relatively high, whether the word was seen in the list or not, because it is likely that the word has been encountered and encoded in the memory matrix several times throughout life. Thus, the brain typically selects a higher criterion in determining whether common words are part of a list, making them harder to successfully select. However, rarer words are typically encountered less throughout life and so their presence in the memory matrix is limited. Hence, low overall summed similarity will lead to a more lax criterion. If the word was present in the list, high context similarity at time of test and other attribute similarity will lead to enough boost in summed similarity to excel past criterion and thus recognize the rare word successfully.
Recency in the serial position effect can be explained because more recent memories encoded will share a temporal context most similar to the present context, as the stochastic nature of time will not have had as pronounced an effect. Thus, context similarity will be high for recently encoded items, so overall similarity will be relatively higher for these items as well. The stochastic contextual drift is also thought to account for forgetting because the context in which a memory was encoded is lost over time, so summed similarity for an item only presented in that context will decrease over time. [7] [8]
Finally, empirical data have shown a contiguity effect, whereby items that are presented together temporally, even though they may not be encoded as a single memory as in the "ab" paradigm described above, are more likely to be remembered together. This can be considered a result of low contextual drift between items remembered together, so the contextual similarity between two items presented together is high.
One of the biggest shortcomings of multiple trace theory is the requirement of some item with which to compare the memory matrix when determining successful encoding. As mentioned above, this works quite well in recognition and cued recall, but there is a glaring inability to incorporate free recall into the model. Free recall requires an individual to freely remember some list of items. Although the very act of asking to recall may act as a cue that can then elicit cued recall techniques, it is unlikely that the cue is unique enough to reach a summed similarity criterion or to otherwise achieve a high probability of recall.
Another major issue lies in translating the model to biological relevance. It is hard to imagine that the brain has unlimited capacity to keep track of such a large matrix of memories and continue expanding it with every item with which it has ever been presented. Furthermore, searching through this matrix is an exhaustive process that would not be relevant on biological time scales. [9]
In mathematics, the determinant is a scalar value that is a function of the entries of a square matrix. It characterizes some properties of the matrix and the linear map represented by the matrix. In particular, the determinant is nonzero if and only if the matrix is invertible and the linear map represented by the matrix is an isomorphism. The determinant of a product of matrices is the product of their determinants (the preceding property is a corollary of this one). The determinant of a matrix A is denoted det(A), det A, or |A|.
In linear algebra, the outer product of two coordinate vectors is a matrix. If the two vectors have dimensions n and m, then their outer product is an n × m matrix. More generally, given two tensors, their outer product is a tensor. The outer product of tensors is also referred to as their tensor product, and can be used to define the tensor algebra.
In mathematics, matrix addition is the operation of adding two matrices by adding the corresponding entries together.
In mathematics, particularly in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the second matrix. The resulting matrix, known as the matrix product, has the number of rows of the first and the number of columns of the second matrix. The product of matrices A and B is denoted as AB.
In linear algebra, a diagonal matrix is a matrix in which the entries outside the main diagonal are all zero; the term usually refers to square matrices. Elements of the main diagonal can either be zero or nonzero. An example of a 2×2 diagonal matrix is , while an example of a 3×3 diagonal matrix is. An identity matrix of any size, or any multiple of it, is a diagonal matrix.
In mathematics, especially the usage of linear algebra in mathematical physics, Einstein notation is a notational convention that implies summation over a set of indexed terms in a formula, thus achieving brevity. As part of mathematics it is a notational subset of Ricci calculus; however, it is often used in physics applications that do not distinguish between tangent and cotangent spaces. It was introduced to physics by Albert Einstein in 1916.
Levinson recursion or Levinson–Durbin recursion is a procedure in linear algebra to recursively calculate the solution to an equation involving a Toeplitz matrix. The algorithm runs in Θ(n2) time, which is a strong improvement over Gauss–Jordan elimination, which runs in Θ(n3).
In mathematics, particularly in matrix theory, a permutation matrix is a square binary matrix that has exactly one entry of 1 in each row and each column and 0s elsewhere. Each such matrix, say P, represents a permutation of m elements and, when used to multiply another matrix, say A, results in permuting the rows or columns of the matrix A.
In linear algebra, a minor of a matrix A is the determinant of some smaller square matrix, cut down from A by removing one or more of its rows and columns. Minors obtained by removing just one row and one column from square matrices are required for calculating matrix cofactors, which in turn are useful for computing both the determinant and inverse of square matrices.
In linear algebra, a QR decomposition, also known as a QR factorization or QU factorization, is a decomposition of a matrix A into a product A = QR of an orthogonal matrix Q and an upper triangular matrix R. QR decomposition is often used to solve the linear least squares problem and is the basis for a particular eigenvalue algorithm, the QR algorithm.
In mathematics, a block matrix or a partitioned matrix is a matrix that is interpreted as having been broken into sections called blocks or submatrices. Intuitively, a matrix interpreted as a block matrix can be visualized as the original matrix with a collection of horizontal and vertical lines, which break it up, or partition it, into a collection of smaller matrices. Any matrix may be interpreted as a block matrix in one or more ways, with each interpretation defined by how its rows and columns are partitioned.
In linear algebra, linear transformations can be represented by matrices. If is a linear transformation mapping to and is a column vector with entries, then
In mathematics, the Kronecker product, sometimes denoted by ⊗, is an operation on two matrices of arbitrary size resulting in a block matrix. It is a specialization of the tensor product from vectors to matrices and gives the matrix of the tensor product linear map with respect to a standard choice of basis. The Kronecker product is to be distinguished from the usual matrix multiplication, which is an entirely different operation. The Kronecker product is also sometimes called matrix direct product.
Quantum statistical mechanics is statistical mechanics applied to quantum mechanical systems. In quantum mechanics a statistical ensemble is described by a density operator S, which is a non-negative, self-adjoint, trace-class operator of trace 1 on the Hilbert space H describing the quantum system. This can be shown under various mathematical formalisms for quantum mechanics. One such formalism is provided by quantum logic.
In signal processing, the Wiener filter is a filter used to produce an estimate of a desired or target random process by linear time-invariant (LTI) filtering of an observed noisy process, assuming known stationary signal and noise spectra, and additive noise. The Wiener filter minimizes the mean square error between the estimated random process and the desired process.
In mathematics, matrix calculus is a specialized notation for doing multivariable calculus, especially over spaces of matrices. It collects the various partial derivatives of a single function with respect to many variables, and/or of a multivariate function with respect to a single variable, into vectors and matrices that can be treated as single entities. This greatly simplifies operations such as finding the maximum or minimum of a multivariate function and solving systems of differential equations. The notation used here is commonly used in statistics and engineering, while the tensor index notation is preferred in physics.
In numerical linear algebra, the Jacobi method is an iterative algorithm for determining the solutions of a strictly diagonally dominant system of linear equations. Each diagonal element is solved for, and an approximate value is plugged in. The process is then iterated until it converges. This algorithm is a stripped-down version of the Jacobi transformation method of matrix diagonalization. The method is named after Carl Gustav Jacob Jacobi.
In mathematics, a moment matrix is a special symmetric square matrix whose rows and columns are indexed by monomials. The entries of the matrix depend on the product of the indexing monomials only
In mental memory, storage is one of three fundamental stages along with encoding and retrieval. Memory is the process of storing and recalling information that was previously acquired. Storing refers to the process of placing newly acquired information into memory, which is modified in the brain for easier storage. Encoding this information makes the process of retrieval easier for the brain where it can be recalled and brought into conscious thinking. Modern memory psychology differentiates between the two distinct types of memory storage: short-term memory and long-term memory. Several models of memory have been proposed over the past century, some of them suggesting different relationships between short- and long-term memory to account for different ways of storing memory.
In mathematics, a matrix is a rectangular array or table of numbers, symbols, or expressions, arranged in rows and columns, which is used to represent a mathematical object or a property of such an object.