The reverse correlation technique is a data driven study method used primarily in psychological and neurophysiological research. [1] This method earned its name from its origins in neurophysiology, where cross-correlations between white noise stimuli and sparsely occurring neuronal spikes could be computed quicker when only computing it for segments preceding the spikes. [1] [2] [3]
The term has since been adopted in psychological experiments that usually do not analyze the temporal dimension, but also present noise to human participants. In contrast to the original meaning, the term is here thought to reflect that the standard psychological practice of presenting stimuli of defined categories to the participants is "reversed": Instead, the participant's mental representations of categories are estimated from interactions of the presented noise and the behavioral responses. [4]
It is used to create composite pictures of individual and/or group mental representations of various items (e.g. faces, [5] bodies, [6] and the self [7] ) that depict characteristics of said items (e.g. trustworthiness [8] and self-body image [9] ). This technique is helpful when evaluating the mental representations of those with and without mental illnesses. [10]
This technique utilizes spike-triggered average to explain what areas of signal and noise in an image are valuable for the given research question. Signal is information used to produce objects of value that help explain and connect the world around us. [11] Noise is commonly referred to as unwanted signal that obscures the information that the signal is trying to present. [12] Most importantly for reverse correlation studies, noise is randomly varying information. To determine the areas of importance using reverse correlation, noise is applied to a base image and then evaluated by observers.
A base image is any image void of noise that relates to the research question. A base image that has noise superimposed on top is the stimuli that is presented to and evaluated by participants. [4] Each time a new set of stimuli is presented to a participant, this is known as a trial. After a participant has responded to hundreds to thousands of trials, a researcher is ready to create a classification image.
A classification image (abbreviated as "CI" in some studies) is a single image that represents the average noise patterns in the images selected by participants. [4] A classification image can also be computed for groups by averaging the individuals’ classification images. [4] These classification images are what researchers use to interpret the data and draw conclusions. As a whole, the reverse correlation method is a process that results in a composite image (from an individual or group) that can be used to estimate and interpret mental representations.
The reverse correlation method is typically executed as an in-lab computer experiment. This method follows four broad steps. Each of the following steps are described in greater detail below.
After creating a research question and determining that the reverse correlation method is the most suitable technique to answer the question, a researcher must (1) design randomly varying stimuli. [4] After the stimuli have been prepared, a researcher should (2) collect data from participants who will see and respond to approximately 300 -1,000 trials. [4] [13] Each trial will either consist of one or two images (side by side) derived from the same base image with noise superimposed on top. Participant responses will depend on the chosen study design; if a researcher presents only one image at a time, participants rate the image on a 4pt scale, but when two images are shown, the participant is asked to choose which best aligns with the given category (e.g. choose the image that looks the most aggressive). [4]
Once all of the data is collected, the researcher will (3) compute classification images for each participant and using those images compute group classification images. [4] Finally, with the classification images available, the researcher will (4) evaluate the images and draw conclusions about their results. [4]
When designing the stimuli for a reverse correlation study, the two primary factors that one should consider are (1) the base image and (2) the noise that will be used. [4] While not all bases are images per se, the majority are and for this reason the base is typically referred to as a base image. The base image should represent whatever the research question is addressing. For example, if you are interested in peoples’ mental representations of Chinese people, it would not make sense to use a base image of a Spanish or Caucasian person. Again, if you are interested in the mental representations of male vocal patterns, it would make the most sense to use a base vocal pattern that has been produced by a male. [4]
Having a base is important because it provides a kind of anchor for participants to work from. When there is no base image, the number of trials that are required increases dramatically, thus making it harder to collect data. [4] While there are studies that have excluded a base image, (e.g. the S study [14] ), for more elaborate and nuanced research questions, it is important to have a base image that is a fair representation of what participants are being asked to categorize. Photographs of faces are generally the most popular base image.
Although the reverse correlation method is capable of investigating a wide variety of research questions, the most common application of the method is for evaluating faces on a single trait. Reverse correlation studies that address evaluations of the face are sometimes referred to as being a face space reverse correlation model (FSRCM). [15] Thankfully, there are existing databases for face images of varying demographics and emotion that work well as base images.
The reverse correlation method can also be used to help researchers identify what areas of an image (e.g. the areas on the face) have diagnostic value. [15] In order to identify these areas of value, researchers start by minimizing the space a participant can pull information from. By imposing a “mask” on an image (e.g. blur an image while leaving random areas un-blurred), this reduces the information individuals might see, and forces them to focus on certain areas. [15] Then, if/when participants are able to correctly identify an image with a trait repeatedly, we can draw conclusions about what areas have diagnostic value. [15]
While faces and visual stimuli are the most popular, this is not the only stimuli that can be used in a reverse correlation study. This method was originally designed for auditory stimuli which allows researchers to investigate how perceivers interpret auditory information and create trait based attributions to different sound patterns. [15] For example, by segmenting a vocal recording of a single word (total sound time 426 ms) into six segments (71 ms each), and varying each segment's pitch using Gaussian distributions, researchers were able to uncover what vocal patterns people associated with certain traits. [16] Specifically, this study investigated how listeners rated sound clips of the word “really” as sounding more interrogative (i.e. like the more common reverse correlation studies this study had participants listen to two sound clips per trial, choose which fit the category the best, and then created an average of the pitch contours). [16] Beyond face and auditory perception, research utilizing the reverse correlation method has expanded to investigate how individuals see three-dimensional objects in images with noise (but no signal). [17] [18]
After selecting your base image, regardless of what the image is, it is helpful to apply a Gaussian blur to smooth noise in the image. While noise will be applied later, it is helpful to reduce existing noise in the photo before applying your chosen noise. [4] There are three primary choices when it comes to noise: white noise, sine-wave noise, and Gabor noise. [4] The latter two of these constrain the configurations that the noise can have, and because of this white noise is usually the most commonly used. [4] Regardless of the type of noise that is chosen, it is crucial that the noise randomly varies. [4]
Once the stimuli for the study has been developed, the researcher must make a few decisions before actually collecting the data. The researcher must come to a conclusion on how many stimuli will be presented at a time and how many trials the participants will see.
In terms of stimuli presentation, a researcher can choose from either a 2-Image Forced Choice (2IFC) or a 4-Alternative Forced Choice (4AFC). The 2IFC presents two images at once (side by side) and requires participants to choose between the two on a specified category (e.g. which image looks the most like a male). [4] Typically the noise from the left image is the mathematical inverse of the noise from the right image. This method was developed to better answer questions that could not be fully answered by the 4AFC method. As compared to the 2IFC, the 4AFC only shows participants one image per trial and requires them to rate the image on a 4-point scale ((1) Probably X, (2) Possibly X, (3) Possibly Y, (4) Probably Y). [4] For example, here X might represent male and Y might represent female. Typically, during data analysis, only images that are chosen as a “probably” category are included. [4]
As mentioned previously, the 2IFC was designed to address questions that could not be easily answered by the 4AFC. In the 4AFC, there is the possibility that participants may not choose a “probably” category, and if this happens, no classification image can be computed. [4] For example, if the base image does not look like the mental representation participants are asked to report on, then participants may never make a confident choice and classify the image under a “probably” category. [4] While this is a flaw in the 4AFC, one advantage to this method and scale structure is that researchers can see participants’ certainty judgements on their classification decisions (e.g. a probably X label would suggest greater confidence in their decision than a possibly X label). [4]
As for choosing the number of trials, generally researchers conducting a reverse correlation study present participants with 300 - 1,000 trials. [13]
Again, a classification image is the calculated average noise of all selected images (stimuli). Classification images can be generated for individuals or the group. Computing a classification image for individuals and groups are slightly different. [4] To compute a classification image for an individual, the researcher will start by creating an average of the all selected images’ noise and then overlay that pattern onto the base image. Before the noise is superimposed, it is scaled to fit the base image (i.e. the smallest and largest pixel intensities are matched to the base image pixels). [4] To generate a classification image for a group, the researcher will either handle each individual classification image separately (making sure to scale the pixels independently) or apply a dependent scaling. A dependent scaling is called such because the scaling that is applied to all classification images depends on the image with the greatest range of pixels. [4] Using this single image and its pixel range, the researcher will match the pixels of the classification image to the pixels of the base image. The scaling factor used for this image is then applied to the remaining classification images. [4] When choosing between these two approaches, keep in mind that in classification images with little signal, independent scaling amplifies signal and noise more than dependent scaling. [4] If the researcher is interested in the strength of signal, it is suggested that they use dependent scaling. [4]
When calculating a classification image, it is critical to consider how your external noise will impact your signal to noise ratio (SNR). The SNR is the ratio of desired input (e.g. signal) to undesired information (e.g. noise). [19] One way to produce a high SNR (when observers are unbiased) is to use this formula C=(NAA+NBA)-(NAB+NBB). [19] These researchers have found the optimal experimental parameters for different study designs that will result in high SNR.
After computing classification images for individual participants and/or for the group, the researcher will use these images to draw conclusions about their research questions. However, while not always the case, occasionally after the first set of classification images have been generated, researchers will then take these images and present them to a new sample of participants and ask them to rate the images on a subsequent factor of interest. This process is referred to as a two-phase reverse correlation. [4] For example, if a classification image was computed after participants were asked to choose the image that looked the most like a police officer, the generated classification images could then be presented to a new sample who would evaluate the images on how aggressive the faces look. This process makes it easier to draw conclusions on the data. While this step can ease in drawing conclusions, one must use caution to not collect too many participants in the second phase, because high numbers of participants will make the tiniest of differences appear significant, therefore resulting in a Type 1 Erro r. [13]
While reverse correlation is typically used to create a visual representation of a single trait, this method does have the capability to create a visual representation of more than one trait in one image. [20] By using the same base image and noise, one can create a classification image of trait 1 and a classification image of trait 2, and then create an aggregate photo of the two classification images (thus creating a new classification image incorporating two social traits). [20]
Additionally, researchers have investigated how the decision-making process impacts and is reflected in the reverse correlation method and have found there is a significant relationship between them. Therefore, when interpreting results using the reverse correlation method, researchers must use caution to not ignore how the decision-making process may influence the data. [21]
Reading signal in a classification image can be difficult. When attempting to interpret signal, researchers suggest that the best practice is to use a recently developed metric referred to as “infoVal”. [22] “InfoVal” compares informational value in the computed classification image to a random distribution. [22] Interpreting an “infoVal” measure is similar to interpreting a z-score. [22]
Computer vision tasks include methods for acquiring, processing, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the form of decisions. "Understanding" in this context signifies the transformation of visual images into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.
Categorization is a type of cognition involving conceptual differentiation between characteristics of conscious experience, such as objects, events, or ideas. It involves the abstraction and differentiation of aspects of experience by sorting and distinguishing between groupings, through classification or typification on the basis of traits, features, similarities or other criteria that are universal to the group. Categorization is considered one of the most fundamental cognitive abilities, and it is studied particularly by psychology and cognitive linguistics.
Image registration is the process of transforming different sets of data into one coordinate system. Data may be multiple photographs, data from different sensors, times, depths, or viewpoints. It is used in computer vision, medical imaging, military automatic target recognition, and compiling and analyzing images and data from satellites. Registration is necessary in order to be able to compare or integrate the data obtained from these different measurements.
Functional magnetic resonance imaging or functional MRI (fMRI) measures brain activity by detecting changes associated with blood flow. This technique relies on the fact that cerebral blood flow and neuronal activation are coupled. When an area of the brain is in use, blood flow to that region also increases.
Pathognomy is "a 'semiotik' of the transient features of someone's face or body, be it voluntary or involuntary". Examples of this can be laughter and winking to the involuntary such as sneezing or coughing. By studying the features or expressions, there is then an attempt to infer the mental state and emotion felt by the individual.
Psychophysics quantitatively investigates the relationship between physical stimuli and the sensations and perceptions they produce. Psychophysics has been described as "the scientific study of the relation between stimulus and sensation" or, more completely, as "the analysis of perceptual processes by studying the effect on a subject's experience or behaviour of systematically varying the properties of a stimulus along one or more physical dimensions".
In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.
An event-related potential (ERP) is the measured brain response that is the direct result of a specific sensory, cognitive, or motor event. More formally, it is any stereotyped electrophysiological response to a stimulus. The study of the brain in this way provides a noninvasive means of evaluating brain functioning.
In psychology, the Stroop effect is the delay in reaction time between congruent and incongruent stimuli.
The Levels of Processing model, created by Fergus I. M. Craik and Robert S. Lockhart in 1972, describes memory recall of stimuli as a function of the depth of mental processing. More analysis produce more elaborate and stronger memory than lower levels of processing. Depth of processing falls on a shallow to deep continuum. Shallow processing leads to a fragile memory trace that is susceptible to rapid decay. Conversely, deep processing results in a more durable memory trace. There are three levels of processing in this model. Structural processing, or visual, is when we remember only the physical quality of the word. Phonemic processing includes remembering the word by the way it sounds. Lastly, we have semantic processing in which we encode the meaning of the word with another word that is similar or has similar meaning. Once the word is perceived, the brain allows for a deeper processing.
Video quality is a characteristic of a video passed through a video transmission or processing system that describes perceived video degradation. Video processing systems may introduce some amount of distortion or artifacts in the video signal that negatively impact the user's perception of the system. For many stakeholders in video production and distribution, ensuring video quality is an important task.
The P300 (P3) wave is an event-related potential (ERP) component elicited in the process of decision making. It is considered to be an endogenous potential, as its occurrence links not to the physical attributes of a stimulus, but to a person's reaction to it. More specifically, the P300 is thought to reflect processes involved in stimulus evaluation or categorization.
Mental chronometry is the scientific study of processing speed or reaction time on cognitive tasks to infer the content, duration, and temporal sequencing of mental operations. Reaction time is measured by the elapsed time between stimulus onset and an individual's response on elementary cognitive tasks (ECTs), which are relatively simple perceptual-motor tasks typically administered in a laboratory setting. Mental chronometry is one of the core methodological paradigms of human experimental, cognitive, and differential psychology, but is also commonly analyzed in psychophysiology, cognitive neuroscience, and behavioral neuroscience to help elucidate the biological mechanisms underlying perception, attention, and decision-making in humans and other species.
Neural coding is a neuroscience field concerned with characterising the hypothetical relationship between the stimulus and the neuronal responses, and the relationship among the electrical activities of the neurons in the ensemble. Based on the theory that sensory and other information is represented in the brain by networks of neurons, it is believed that neurons can encode both digital and analog information.
A convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns features by itself via filter optimization. This type of deep learning network has been applied to process and make predictions from many different types of data including text, images and audio. Convolution-based networks are the de-facto standard in deep learning-based approaches to computer vision and image processing, and have only recently have been replaced -- in some cases -- by newer deep learning architectures such as the transformer. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.
Elaborative encoding is a mnemonic system that uses some form of elaboration, such as an emotional cue, to assist in the retention of memories and knowledge. In this system one attaches an additional piece of information to a memory task which makes it easier to recall. For instance, one may recognize a face easier if character traits are also imparted about the person at the same time.
DeepDream is a computer vision program created by Google engineer Alexander Mordvintsev that uses a convolutional neural network to find and enhance patterns in images via algorithmic pareidolia, thus creating a dream-like appearance reminiscent of a psychedelic experience in the deliberately overprocessed images.
Alexander Todorov is a Bulgarian professor of psychology at the University of Chicago Booth School of Business. Before his current position, he was a professor at Princeton University. His research is focused on how humans perceive, evaluate, and make sense of the social world. Todorov's research on first impressions has received media coverage from the New York Times, The Guardian, The New Yorker, The Daily Telegraph, Scientific American, National Geographic, BBC, PBS, and NPR.
Association in psychology refers to a mental connection between concepts, events, or mental states that usually stems from specific experiences. Associations are seen throughout several schools of thought in psychology including behaviorism, associationism, psychoanalysis, social psychology, and structuralism. The idea stems from Plato and Aristotle, especially with regard to the succession of memories, and it was carried on by philosophers such as John Locke, David Hume, David Hartley, and James Mill. It finds its place in modern psychology in such areas as memory, learning, and the study of neural pathways.
Neural synchrony is the correlation of brain activity across two or more people over time. In social and affective neuroscience, neural synchrony specifically refers to the degree of similarity between the spatio-temporal neural fluctuations of multiple people. This phenomenon represents the convergence and coupling of different people's neurocognitive systems, and it is thought to be the neural substrate for many forms of interpersonal dynamics and shared experiences. Some research also refers to neural synchrony as inter-brain synchrony, brain-to-brain coupling, inter-subject correlation, between-brain connectivity, or neural coupling. In the current literature, neural synchrony is notably distinct from intra-brain synchrony—sometimes also called neural synchrony—which denotes the coupling of activity across regions of a single individual's brain.
{{cite journal}}
: Cite journal requires |journal=
(help)