Imageability

Last updated

Imageability is a measure of how easily a physical object, word or environment will evoke a clear mental image in the mind of any person observing it. [1] [2] It is used in architecture and city planning, in psycholinguistics, [3] and in automated computer vision research. [4] In automated image recognition, training models to connect images with concepts that have low imageability can lead to biased and harmful results. [4]

Contents

History and components

Kevin A. Lynch first introduced the term, "imageability" in his 1960 book, The Image of the City . [1] [5] In the book, Lynch argues cities contain a key set of physical elements that people use to understand the environment, orient themselves inside of it, and assign it meaning. [6]

Lynch argues the five key elements that impact the imageability of a city are Paths, Edges, Districts, Nodes, and Landmarks.

In 1914, half a century before The Image of the City was published, Paul Stern discussed a concept similar to imageability in the context of art. Stern, in Susan Langer's Reflections on Art, names the attribute that describes how vividly and intensely an artistic object could be experienced apparency. [7]

In computer vision

Automated image recognition was developed by using machine learning to find patterns in large, annotated datasets of photographs, like ImageNet. Images in ImageNet are labelled using concepts in WordNet. Concepts that are easily expressed verbally, like "early", are seen as less "imageable" than nouns referring to physical objects like "leaf". Training AI models to associate concepts with low imageability with specific images can lead to problematic bias in image recognition algorithms. This has particularly been critiqued as it relates to the "person" category of WordNet and therefore also ImageNet. Trevor Pagan and Kate Crawford demonstrated in their essay "Excavating AI" and their art project ImageNet Roulette how this leads to photos of ordinary people being labelled by AI systems as "terrorists" or "sex offenders". [8]

Images in datasets are often labelled as having a certain level of imageability. As described by Kaiyu Yang, Fei-Fei Li and co-authors, this is often done following criteria from Allan Paivio and collaborators' 1968 psycholinguistic study of nouns. [3] Yang el.al. write that dataset annotators tasked with labelling imageability "see a list of words and rate each word on a 1-7 scale from 'low imagery' to 'high imagery'. [4]

To avoid biased or harmful image recognition and image generation, Yang et.al. recommend not training vision recognition models on concepts with low imageability, especially when the concepts are offensive (such as sexual or racial slurs) or sensitive (their examples for this category include "orphan", "separatist", "Anglo-Saxon" and "crossover voter"). Even "safe" concepts with low imageability, like "great-niece" or "vegetarian" can lead to misleading results and should be avoided. [4]

See also

Further reading

Related Research Articles

<span class="mw-page-title-main">WordNet</span> Computational lexicon of English

WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into synsets with short definitions and usage examples. It can thus be seen as a combination and extension of a dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. It was first created in the English language and the English WordNet database and software tools have been released under a BSD style license and are freely available for download from that WordNet website. There are now WordNets in more than 200 languages.

Psycholinguistics or psychology of language is the study of the interrelation between linguistic factors and psychological aspects. The discipline is mainly concerned with the mechanisms by which language is processed and represented in the mind and brain; that is, the psychological and neurobiological factors that enable humans to acquire, use, comprehend, and produce language.

Recall in memory refers to the mental process of retrieval of information from the past. Along with encoding and storage, it is one of the three core processes of memory. There are three main types of recall: free recall, cued recall and serial recall. Psychologists test these forms of recall as a way to study the memory processes of humans and animals. Two main theories of the process of recall are the two-stage theory and the theory of encoding specificity.

In the philosophy of mind, neuroscience, and cognitive science, a mental image is an experience that, on most occasions, significantly resembles the experience of "perceiving" some object, event, or scene but occurs when the relevant object, event, or scene is not actually present to the senses. There are sometimes episodes, particularly on falling asleep and waking up, when the mental imagery may be dynamic, phantasmagoric, and involuntary in character, repeatedly presenting identifiable objects or actions, spilling over from waking events, or defying perception, presenting a kaleidoscopic field, in which no distinct object can be discerned. Mental imagery can sometimes produce the same effects as would be produced by the behavior or experience imagined.

<span class="mw-page-title-main">Picture superiority effect</span> Psychological phenomenon

The picture superiority effect refers to the phenomenon in which pictures and images are more likely to be remembered than are words. This effect has been demonstrated in numerous experiments using different methods. It is based on the notion that "human memory is extremely sensitive to the symbolic modality of presentation of event information". Explanations for the picture superiority effect are not concrete and are still being debated.

<span class="mw-page-title-main">Dual-coding theory</span> Theory of cognition

Dual-coding theory is a theory of cognition that suggests that the mind processes information along two different channels; verbal, and visual. It was hypothesized by Allan Paivio of the University of Western Ontario in 1971. In developing this theory, Paivio used the idea that the formation of mental images aids learning through the picture superiority effect.

<span class="mw-page-title-main">Automatic image annotation</span>

Automatic image annotation is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database.

The lexical decision task (LDT) is a procedure used in many psychology and psycholinguistics experiments. The basic procedure involves measuring how quickly people classify stimuli as words or nonwords.

Language production is the production of spoken or written language. In psycholinguistics, it describes all of the stages between having a concept to express and translating that concept into linguistic forms. These stages have been described in two types of processing models: the lexical access models and the serial models. Through these models, psycholinguists can look into how speeches are produced in different ways, such as when the speaker is bilingual. Psycholinguists learn more about these models and different kinds of speech by using language production research methods that include collecting speech errors and elicited production tasks.

David Swinney was a prominent psycholinguist. His research on language comprehension contributed to methodological advances in his field.

Caltech 101 is a data set of digital images created in September 2003 and compiled by Fei-Fei Li, Marco Andreetto, Marc 'Aurelio Ranzato and Pietro Perona at the California Institute of Technology. It is intended to facilitate Computer Vision research and techniques and is most applicable to techniques involving image recognition classification and categorization. Caltech 101 contains a total of 9,146 images, split between 101 distinct object categories and a background category. Provided with the images are a set of annotations describing the outlines of each image, along with a Matlab script for viewing.

Priming is the idea that exposure to one stimulus may influence a response to a subsequent stimulus, without conscious guidance or intention. The priming effect refers to the positive or negative effect of a rapidly presented stimulus on the processing of a second stimulus that appears shortly after. Generally speaking, the generation of priming effect depends on the existence of some positive or negative relationship between priming and target stimuli. For example, the word nurse might be recognized more quickly following the word doctor than following the word bread. Priming can be perceptual, associative, repetitive, positive, negative, affective, semantic, or conceptual. Priming effects involve word recognition, semantic processing, attention, unconscious processing, and many other issues, and are related to differences in various writing systems. Research, however, has yet to firmly establish the duration of priming effects, yet their onset can be almost instantaneous.

Raymond W. Gibbs Jr. is a former psychology professor and researcher at the University of California, Santa Cruz. His research interests are in the fields of experimental psycholinguistics and cognitive science. His work concerns a range of theoretical issues, ranging from questions about the role of embodied experience in thought and language, to looking at people's use and understanding of figurative language. Raymond Gibbs's research is especially focused on bodily experience and linguistic meaning. Much of his research is motivated by theories of meaning in philosophy, linguistics, and comparative literature.

<span class="mw-page-title-main">Fei-Fei Li</span> American computer scientist (born 1976)

Fei-Fei Li is an American computer scientist, who was born in China and is known for establishing ImageNet, the dataset that enabled rapid advances in computer vision in the 2010s. She is the Sequoia Capital Professor of Computer Science at Stanford University and former board director at Twitter. Li is a Co-Director of the Stanford Institute for Human-Centered Artificial Intelligence, and a Co-Director of the Stanford Vision and Learning Lab. She served as the director of the Stanford Artificial Intelligence Laboratory (SAIL) from 2013 to 2018.

The ImageNet project is a large visual database designed for use in visual object recognition software research. More than 14 million images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided. ImageNet contains more than 20,000 categories, with a typical category, such as "balloon" or "strawberry", consisting of several hundred images. The database of annotations of third-party image URLs is freely available directly from ImageNet, though the actual images are not owned by ImageNet. Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes.

<span class="mw-page-title-main">Labeled data</span>

Labeled data is a group of samples that have been tagged with one or more labels. Labeling typically takes a set of unlabeled data and augments each piece of it with informative tags. For example, a data label might indicate whether a photo contains a horse or a cow, which words were uttered in an audio recording, what type of action is being performed in a video, what the topic of a news article is, what the overall sentiment of a tweet is, or whether a dot in an X-ray is a tumor.

Alice Fenvessy Healy is a psychologist and College Professor of Distinction Emeritus at the University of Colorado Boulder where she founded and directed the Center for Research on Training. She is known for her research in the field of cognitive psychology, spanning diverse topics including short-term memory, long-term memory, psycholinguistics, reading, decision-making, and cognitive training.

Olga Russakovsky is an assistant professor of computer science at Princeton University. Her research investigates computer vision and machine learning. She was one of the leaders of the ImageNet Large Scale Visual Recognition challenge and has been recognised by MIT Technology Review as one of the world's top young innovators.

80 Million Tiny Images is a dataset intended for training machine learning systems. It contains 79,302,017 32×32 pixel color images, scaled down from images extracted from the World Wide Web in 2008 using automated web search queries on a set of 75,062 non-abstract nouns derived from WordNet. The words in the search terms were then used as labels for the images. The researchers used seven web search resources for this purpose: Altavista, Ask.com, Flickr, Cydral, Google, Picsearch and Webshots.

Semantics within psychology is the study of how meaning is stored in the mind. Semantic memory is a type of long-term declarative memory that refers to facts or ideas which are not immediately drawn from personal experience. It was first theorized in 1972 by W. Donaldson and Endel Tulving. Tulving employs the word semantic to describe a system of memory that involves “words and verbal symbols, their meanings and referents, the relations between them, and the rules, formulas, or algorithms for influencing them”.

References

  1. 1 2 3 Lynch, Kevin, 1918-1984. (1960). The image of the city. Cambridge, Mass.: MIT Press. ISBN   0-262-12004-6. OCLC   230082.{{cite book}}: CS1 maint: multiple names: authors list (link) CS1 maint: numeric names: authors list (link)
  2. Dellantonio, Sara; Job, Remo; Mulatti, Claudio (2014-04-03). "Imageability: now you see it again (albeit in a different form)". Frontiers in Psychology. 5: 279. doi: 10.3389/fpsyg.2014.00279 . ISSN   1664-1078. PMC   3982064 . PMID   24765083.
  3. 1 2 Paivio, Allan; Yuille, John C.; Madigan, Stephen A. (1968). "Concreteness, imagery, and meaningfulness values for 925 nouns". Journal of Experimental Psychology. 76 (1, Pt.2): Suppl:1–25. doi:10.1037/h0025327. ISSN   0022-1015. PMID   5672258.
  4. 1 2 3 4 Yang, Kaiyu; Qinami, Klint; Fei-Fei, Li; Deng, Jia; Russakovsky, Olga (2020-01-27). "Towards fairer datasets: Filtering and balancing the distribution of the people subtree in the ImageNet hierarchy". Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. FAT* '20. New York, NY, USA: Association for Computing Machinery. pp. 547–558. arXiv: 1912.07726 . doi:10.1145/3351095.3375709. ISBN   978-1-4503-6936-7. S2CID   209386709.
  5. "Analyzing Lynch's City Imageability in the Digital Age". Planetizen - Urban Planning News, Jobs, and Education. Retrieved 2020-02-15.
  6. The urban design reader. Larice, Michael, 1962-, Macdonald, Elizabeth, 1959- (Second ed.). London. 2013. ISBN   978-0-203-09423-5. OCLC   1139281591.{{cite book}}: CS1 maint: location missing publisher (link) CS1 maint: others (link)
  7. Langer, Susanne K. (Susanne Katherina Knauth), 1895-1985 (1979) [1958]. Reflections on art. New York: Arno Press. ISBN   0-405-10611-4. OCLC   4570406.{{cite book}}: CS1 maint: multiple names: authors list (link) CS1 maint: numeric names: authors list (link)
  8. Crawford, Kate; Trevor, Pagan (2019). "Excavating AI: The Politics of Images in Machine Learning Datasets". The AI Now Institute.