Large Scale Concept Ontology for Multimedia

Last updated

The Large-Scale Concept Ontology for Multimedia project was a series of workshops held from April 2004 to September 2006 [1] for the purpose of defining a standard formal vocabulary for the annotation and retrieval of video.

Contents

Mandate

The Large-Scale Concept Ontology for Multimedia project was sponsored by the Disruptive Technology Office and brought together representatives from a variety of research communities, such as multimedia learning, information retrieval, computational linguistics, library science, and knowledge representation, as well as "user" communities such as intelligence agencies and broadcasters, to work collaboratively towards defining a set of 1,000 concepts. [2] Individually, each concept was to meet the following criteria: [3]

Jointly, these concepts were to meet the additional criterion of providing broad (domain independent) coverage. [3] High-level target areas for coverage included physical objects, including animate objects (such as people, mobs, and animals), and inanimate objects, ranging from large-scale (such as buildings and highways) to small-scale (such as telephones and appliances); actions and events; locations and settings; and graphics. The effort was led by Dr. Milind Naphade, who was the principal investigator along with researchers from Carnegie Mellon University, Columbia University, and IBM. [1]

Development tracks

The project had two main "tracks": the development and deployment of keyframe annotation tools (performed by CMU and Columbia), and the development of the Large-Scale Concept Ontology for Multimedia concept hierarchy itself. The second track was executed in two phases: The first consisted in the manual construction of an 884 concept hierarchy, was performed collaboratively among the research and user community representatives.

The second track, performed by knowledge representation experts at Cycorp, Inc., involved the mapping of the concepts into the Cyc knowledge base and the use of the Cyc inference engine to semi-automatically refine, correct, and expand the concept hierarchy. The mapping/expansion phase of the project was motivated by a desire to increase breadth—the mapping had the effect of moving from 884 concepts to well past the initial goal of 1000—and to move Large-Scale Concept Ontology for Multimedia from a one-dimensional hierarchy of concepts, to a full-blown ontology of rich semantic connections. [3]

Project results

The outputs of the effort included: [1]

  1. A "lite" version of the Large-Scale Concept Ontology for Multimedia concept hierarchy consisting of a subset of 449 concepts.
  2. A corpus of 61,901 video keyframes, taken from the 2006 TRECVID data set, annotated using Large-Scale Concept Ontology for Multimedia "lite."
  3. The full taxonomy of 2,638 concepts, built semi-automatically by mapping 884 concepts, manually identified by collaborators, into the Cyc knowledge base, and querying the Cyc inference engine for useful additions.
  4. The full ontology, in the form of a 2006 ResearchCyc release that contained the Large-Scale Concept Ontology for Multimedia mappings into the Cyc ontology.

Public detectors

Several sets of concept detectors were developed and released for public use:

  1. VIREO-374, 374 detectors developed by City University of Hong Kong.
  2. Columbia374, 374 detectors developed by Columbia University.
  3. Mediamill101, 101 detectors developed by The University of Amsterdam.

Use in the larger research community

Since its release, Large-Scale Concept Ontology for Multimedia has begun to be used successfully in visual recognition research: Apart from research done by project participants, it has been used by independent research in concept extraction from images, [4] [5] and has served as the basis for a video annotation tool. [6]

See also

Related Research Articles

Cyc

Cyc is a long-term artificial intelligence project that aims to assemble a comprehensive ontology and knowledge base that spans the basic concepts and rules about how the world works. Hoping to capture common sense knowledge, Cyc focuses on implicit knowledge that other AI platforms may take for granted. This is contrasted with facts one might find somewhere on the internet or retrieve via a search engine or Wikipedia. Cyc enables semantic reasoners to perform human-like reasoning and be less "brittle" when confronted with novel situations.

Knowledge representation and reasoning is the field of artificial intelligence (AI) dedicated to representing information about the world in a form that a computer system can use to solve complex tasks such as diagnosing a medical condition or having a dialog in a natural language. Knowledge representation incorporates findings from psychology about how humans solve problems and represent knowledge in order to design formalisms that will make complex systems easier to design and build. Knowledge representation and reasoning also incorporates findings from logic to automate various kinds of reasoning, such as the application of rules or the relations of sets and subsets.

The Semantic Web is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

WordNet Computational lexicon of English

WordNet is a lexical database of semantic relations between words in more than 200 languages. WordNet links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into synsets with short definitions and usage examples. WordNet can thus be seen as a combination and extension of a dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. WordNet was first created in the English language and the English WordNet database and software tools have been released under a BSD style license and are freely available for download from that WordNet website.

In computer science and information science, an ontology encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts, data and entities that substantiate one, many, or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of concepts and categories that represent the subject.

The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects.

Michael John Witbrock is a computer scientist in the field of artificial intelligence. Witbrock is a native of New Zealand and is the former Vice President of Research at Cycorp, which is carrying out the Cyc project in an effort to produce a genuine Artificial Intelligence.

Content-based image retrieval

Content-based image retrieval, also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR), is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases. Content-based image retrieval is opposed to traditional concept-based approaches.

Automatic image annotation is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database.

In information science, an upper ontology is an ontology which consists of very general terms that are common across all domains. An important function of an upper ontology is to support broad semantic interoperability among a large number of domain-specific ontologies by providing a common starting point for the formulation of definitions. Terms in the domain ontology are ranked under the terms in the upper ontology, e.g., the upper ontology classes are superclasses or supersets of all the classes in the domain ontologies.

Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data.

Ontology-based data integration involves the use of one or more ontologies to effectively combine data or information from multiple heterogeneous sources. It is one of the multiple data integration approaches and may be classified as Global-As-View (GAV). The effectiveness of ontology‑based data integration is closely tied to the consistency and expressivity of the ontology used in the integration process.

Machine interpretation of documents and services in Semantic Web environment is primarily enabled by (a) the capability to mark documents, document segments and services with semantic tags and (b) the ability to establish contextual relations between the tags with a domain model, which is formally represented as ontology. Human beings use natural languages to communicate an abstract view of the world. Natural language constructs are symbolic representations of human experience and are close to the conceptual model that Semantic Web technologies deal with. Thus, natural language constructs have been naturally used to represent the ontology elements. This makes it convenient to apply Semantic Web technologies in the domain of textual information. In contrast, multimedia documents are perceptual recording of human experience. An attempt to use a conceptual model to interpret the perceptual records gets severely impaired by the semantic gap that exists between the perceptual media features and the conceptual world. Notably, the concepts have their roots in perceptual experience of human beings and the apparent disconnect between the conceptual and the perceptual world is rather artificial. The key to semantic processing of multimedia data lies in harmonizing the seemingly isolated conceptual and the perceptual worlds. Representation of the Domain knowledge needs to be extended to enable perceptual modeling, over and above conceptual modeling that is supported. The perceptual model of a domain primarily comprises observable media properties of the concepts. Such perceptual models are useful for semantic interpretation of media documents, just as the conceptual models help in the semantic interpretation of textual documents.

Amit Sheth is a computer scientist at University of South Carolina in Columbia, South Carolina. He is the founding Director of the Artificial Intelligence Institute, and a Professor of Computer Science and Engineering. From 2007 to June 2019, he was the Lexis Nexis Ohio Eminent Scholar, director of the Ohio Center of Excellence in Knowledge-enabled Computing, and a Professor of Computer Science at Wright State University. Sheth's work has been cited by over 48,800 publications. He has an h-index of 106, which puts him among the top 100 computer scientists with the highest h-index. Prior to founding the Kno.e.sis Center, he served as the director of the Large Scale Distributed Information Systems Lab at the University of Georgia in Athens, Georgia.

A concept search is an automated information retrieval method that is used to search electronically stored unstructured text for information that is conceptually similar to the information provided in a search query. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query.

Ontology engineering field which studies the methods and methodologies for building ontologies

In computer science, information science and systems engineering, ontology engineering is a field which studies the methods and methodologies for building ontologies, which encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts, data and entities. In a broader sense, this field also includes a knowledge construction of the domain using formal ontology representations such as OWL/RDF. A large-scale representation of abstract concepts such as actions, time, physical objects and beliefs would be an example of ontological engineering. Ontology engineering is one of the areas of applied ontology, and can be seen as an application of philosophical ontology. Core ideas and objectives of ontology engineering are also central in conceptual modeling.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

UMBEL

UMBEL is a logically organized knowledge graph of 34,000 concepts and entity types that can be used in information science for relating information from disparate sources to one another. It was retired at the end of 2019. UMBEL was first released in July 2008. Version 1.00 was released in February 2011. Its current release is version 1.50.

Drama annotation is the process of annotating the metadata of a drama. Given a drama expressed in some medium, the process of metadata annotation identifies what are the elements that characterize the drama and annotates such elements in some metadata format. For example, in the sentence "Laertes and Polonius warn Ophelia to stay away from Hamlet." from the text Hamlet, the word "Laertes", which refers to a drama element, namely a character, will be annotated as "Char", taken from some set of metadata. This article addresses the drama annotation projects, with the sets of metadata and annotations proposed in the scientific literature, based markup languages and ontologies.

Shih-Fu Chang is a Taiwanese American computer scientist and electrical engineer noted for his research on multimedia information retrieval, computer vision, machine learning, and signal processing. He is currently the interim dean of the School of Engineering and Applied Science of Columbia University, where he is also the Richard Dicker Professor. He served as the chair of the Special Interest Group of Multimedia (SIGMM) of Association of Computing Machinery (ACM) from 2013 to 2017. He was ranked as the Most Influential Scholar in the field of Multimedia by Aminer in 2016. He was elected as an ACM Fellow in 2017.

References

  1. 1 2 3 Naphade, et al., "Large Scale Concept Ontology for Multimedia: VACE Workshop Report,"
  2. Naphade, et al., "A Large Scale Concept Ontology for Multimedia Understanding," ppt presentation published by MITRE Archived 2006-05-06 at the Wayback Machine
  3. 1 2 3 Naphade, et al., "Large-Scale Concept Ontology for Multimedia," IEEE MultiMedia, vol. 13, no. 3, pp. 86-91, July-September 2006.
  4. Snoek, et al., "Adding Semantics to Detectors for Video Retrieval," forthcoming in IEEE Transactions on Multimedia, 2007
  5. Worring, et al., "The MediaMill Large-lexicon Concept Suggestion Engine," forthcoming, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Honolulu, Hawaii, USA, April 2007.
  6. Emilie Garanaud, Smeaton, A., and Koskela, M., "Evaluation of a Video Annotation Tool Based on the LSCOM Ontology," in Proceedings of the First International Conference on Semantics and Digital Media Technology, Athens, Greece, 6-8 December 2006. Archived 20 July 2011 at the Wayback Machine