Imageability is a measure of how easily a physical object, word or environment will evoke a clear mental image in the mind of any person observing it. [1] [2] It is used in architecture and city planning, in psycholinguistics, [3] and in automated computer vision research. [4] In automated image recognition, training models to connect images with concepts that have low imageability can lead to biased and harmful results. [4]
Kevin A. Lynch first introduced the term, "imageability" in his 1960 book, The Image of the City . [1] [5] In the book, Lynch argues cities contain a key set of physical elements that people use to understand the environment, orient themselves inside of it, and assign it meaning. [6]
Lynch argues the five key elements that impact the imageability of a city are Paths, Edges, Districts, Nodes, and Landmarks.
In 1914, half a century before The Image of the City was published, Paul Stern discussed a concept similar to imageability in the context of art. Stern, in Susan Langer's Reflections on Art, names the attribute that describes how vividly and intensely an artistic object could be experienced apparency. [7]
Automated image recognition was developed by using machine learning to find patterns in large, annotated datasets of photographs, like ImageNet. Images in ImageNet are labelled using concepts in WordNet. Concepts that are easily expressed verbally, like "early", are seen as less "imageable" than nouns referring to physical objects like "leaf". Training AI models to associate concepts with low imageability with specific images can lead to problematic bias in image recognition algorithms. This has particularly been critiqued as it relates to the "person" category of WordNet and therefore also ImageNet. Trevor Pagan and Kate Crawford demonstrated in their essay "Excavating AI" and their art project ImageNet Roulette how this leads to photos of ordinary people being labelled by AI systems as "terrorists" or "sex offenders". [8]
Images in datasets are often labelled as having a certain level of imageability. As described by Kaiyu Yang, Fei-Fei Li and co-authors, this is often done following criteria from Allan Paivio and collaborators' 1968 psycholinguistic study of nouns. [3] Yang el.al. write that dataset annotators tasked with labelling imageability "see a list of words and rate each word on a 1-7 scale from 'low imagery' to 'high imagery'. [4]
To avoid biased or harmful image recognition and image generation, Yang et.al. recommend not training vision recognition models on concepts with low imageability, especially when the concepts are offensive (such as sexual or racial slurs) or sensitive (their examples for this category include "orphan", "separatist", "Anglo-Saxon" and "crossover voter"). Even "safe" concepts with low imageability, like "great-niece" or "vegetarian" can lead to misleading results and should be avoided. [4]
Psycholinguistics or psychology of language is the study of the interrelation between linguistic factors and psychological aspects. The discipline is mainly concerned with the mechanisms by which language is processed and represented in the mind and brain; that is, the psychological and neurobiological factors that enable humans to acquire, use, comprehend, and produce language.
Recall in memory refers to the mental process of retrieval of information from the past. Along with encoding and storage, it is one of the three core processes of memory. There are three main types of recall: free recall, cued recall and serial recall. Psychologists test these forms of recall as a way to study the memory processes of humans and animals. Two main theories of the process of recall are the two-stage theory and the theory of encoding specificity.
George Armitage Miller was an American psychologist who was one of the founders of cognitive psychology, and more broadly, of cognitive science. He also contributed to the birth of psycholinguistics. Miller wrote several books and directed the development of WordNet, an online word-linkage database usable by computer programs. He authored the paper, "The Magical Number Seven, Plus or Minus Two," in which he observed that many different experimental findings considered together reveal the presence of an average limit of seven for human short-term memory capacity. This paper is frequently cited by psychologists and in the wider culture. Miller won numerous awards, including the National Medal of Science.
A facial recognition system is a technology potentially capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, and works by pinpointing and measuring facial features from a given image.
The picture superiority effect refers to the phenomenon in which pictures and images are more likely to be remembered than are words. This effect has been demonstrated in numerous experiments using different methods. It is based on the notion that "human memory is extremely sensitive to the symbolic modality of presentation of event information". Explanations for the picture superiority effect are not concrete and are still being debated, however an evolutionary explanation is that sight has a long history stretching back millions of years and was crucial to survival in the past, whereas reading is a relatively recent invention, and requires specific cognitive processes, such as decoding symbols and linking them to meaning.
Dual-coding theory is a theory of cognition that suggests that the mind processes information along two different channels; verbal and nonverbal. It was hypothesized by Allan Paivio of the University of Western Ontario in 1971. In developing this theory, Paivio used the idea that the formation of mental imagery aids learning through the picture superiority effect.
Shallow parsing is an analysis of a sentence which first identifies constituent parts of sentences and then links them to higher order units that have discrete grammatical meanings. While the most elementary chunking algorithms simply link constituent parts on the basis of elementary search patterns, approaches that use machine learning techniques can take contextual information into account and thus compose chunks in such a way that they better reflect the semantic relations between the basic constituents. That is, these more advanced methods get around the problem that combinations of elementary constituents can have different higher level meanings depending on the context of the sentence.
Automatic image annotation is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database.
The lexical decision task (LDT) is a procedure used in many psychology and psycholinguistics experiments. The basic procedure involves measuring how quickly people classify stimuli as words or nonwords.
Roger William Brown was an American psychologist. He was known for his work in social psychology and in children's language development.
Language production is the production of spoken or written language. In psycholinguistics, it describes all of the stages between having a concept to express and translating that concept into linguistic forms. These stages have been described in two types of processing models: the lexical access models and the serial models. Through these models, psycholinguists can look into how speeches are produced in different ways, such as when the speaker is bilingual. Psycholinguists learn more about these models and different kinds of speech by using language production research methods that include collecting speech errors and elicited production tasks.
David Swinney was a prominent psycholinguist. His research on language comprehension contributed to methodological advances in his field.
Caltech 101 is a data set of digital images created in September 2003 and compiled by Fei-Fei Li, Marco Andreetto, Marc 'Aurelio Ranzato and Pietro Perona at the California Institute of Technology. It is intended to facilitate computer vision research and techniques and is most applicable to techniques involving image recognition classification and categorization. Caltech 101 contains a total of 9,146 images, split between 101 distinct object categories and a background category. Provided with the images are a set of annotations describing the outlines of each image, along with a Matlab script for viewing.
Priming is a concept in psychology to describe how exposure to one stimulus may influence a response to a subsequent stimulus, without conscious guidance or intention. The priming effect is the positive or negative effect of a rapidly presented stimulus on the processing of a second stimulus that appears shortly after. Generally speaking, the generation of priming effect depends on the existence of some positive or negative relationship between priming and target stimuli. For example, the word nurse might be recognized more quickly following the word doctor than following the word bread. Priming can be perceptual, associative, repetitive, positive, negative, affective, semantic, or conceptual. Priming effects involve word recognition, semantic processing, attention, unconscious processing, and many other issues, and are related to differences in various writing systems. How quickly this effect occurs is contested; some researchers claim that priming effects are almost instantaneous.
The encoding specificity principle is the general principle that matching the encoding contexts of information at recall assists in the retrieval of episodic memories. It provides a framework for understanding how the conditions present while encoding information relate to memory and recall of that information.
Fei-Fei Li is a Chinese-American computer scientist, known for establishing ImageNet, the dataset that enabled rapid advances in computer vision in the 2010s. She is the Sequoia Capital professor of computer science at Stanford University and former board director at Twitter. Li is a co-director of the Stanford Institute for Human-Centered Artificial Intelligence and a co-director of the Stanford Vision and Learning Lab. She served as the director of the Stanford Artificial Intelligence Laboratory from 2013 to 2018.
The ImageNet project is a large visual database designed for use in visual object recognition software research. More than 14 million images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided. ImageNet contains more than 20,000 categories, with a typical category, such as "balloon" or "strawberry", consisting of several hundred images. The database of annotations of third-party image URLs is freely available directly from ImageNet, though the actual images are not owned by ImageNet. Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge, where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes.
Olga Russakovsky is an associate professor of computer science at Princeton University. Her research investigates computer vision and machine learning. She was one of the leaders of the ImageNet Large Scale Visual Recognition challenge and has been recognised by MIT Technology Review as one of the world's top young innovators.
80 Million Tiny Images is a dataset intended for training machine learning systems constructed by Antonio Torralba, Rob Fergus, and William T. Freeman in a collaboration between MIT and New York University. It was published in 2008.
Semantics within psychology is the study of how meaning is stored in the mind. Semantic memory is a type of long-term declarative memory that refers to facts or ideas which are not immediately drawn from personal experience. It was first theorized in 1972 by W. Donaldson and Endel Tulving. Tulving employs the word semantic to describe a system of memory that involves “words and verbal symbols, their meanings and referents, the relations between them, and the rules, formulas, or algorithms for influencing them”.
{{cite book}}
: CS1 maint: multiple names: authors list (link) CS1 maint: numeric names: authors list (link){{cite book}}
: CS1 maint: location missing publisher (link) CS1 maint: others (link){{cite book}}
: CS1 maint: multiple names: authors list (link) CS1 maint: numeric names: authors list (link)