Open Mind Common Sense

Last updated

Open Mind Common Sense (OMCS) is an artificial intelligence project based at the Massachusetts Institute of Technology (MIT) Media Lab whose goal is to build and utilize a large commonsense knowledge base from the contributions of many thousands of people across the Web. It has been active from 1999 to 2016.

Contents

Since its founding, it has accumulated more than a million English facts from over 15,000 contributors in addition to knowledge bases in other languages. Much of OMCS's software is built on three interconnected representations: the natural language corpus that people interact with directly, a semantic network built from this corpus called ConceptNet , and a matrix-based representation of ConceptNet called AnalogySpace that can infer new knowledge using dimensionality reduction. [1] The knowledge collected by Open Mind Common Sense has enabled research projects at MIT and elsewhere.

History

The project was the brainchild of Marvin Minsky, Push Singh, Catherine Havasi, and others. Development work began in September 1999, and the project opened to the Internet a year later. Havasi described it in her dissertation as "an attempt to ... harness some of the distributed human computing power of the Internet, an idea which was then only in its early stages." [2] The original OMCS was influenced by the website Everything2 and its predecessor, and presents a minimalist interface that is inspired by Google.

Push Singh would have become a professor at the MIT Media Lab and lead the Common Sense Computing group in 2007, but committed suicide on February 28, 2006. [3]

The project is currently run by the Digital Intuition Group at the MIT Media Lab under Catherine Havasi. [ citation needed ]

Database and website

There are many different types of knowledge in OMCS. Some statements convey relationships between objects or events, expressed as simple phrases of natural language: some examples include "A coat is used for keeping warm", "The sun is very hot", and "The last thing you do when you cook dinner is wash your dishes". The database also contains information on the emotional content of situations, in such statements as "Spending time with friends causes happiness" and "Getting into a car wreck makes one angry". OMCS contains information on people's desires and goals, both large and small, such as "People want to be respected" and "People want good coffee". [1]

Originally, these statements could be entered into the Web site as unconstrained sentences of text, which had to be parsed later. The current version of the Web site collects knowledge only using more structured fill-in-the-blank templates. OMCS also makes use of data collected by the Game With a Purpose "Verbosity". [4]

In its native form, the OMCS database is simply a collection of these short sentences that convey some common knowledge. In order to use this knowledge computationally, it has to be transformed into a more structured representation.

ConceptNet

ConceptNet is a semantic network based on the information in the OMCS database. ConceptNet is expressed as a directed graph whose nodes are concepts, and whose edges are assertions of common sense about these concepts. Concepts represent sets of closely related natural language phrases, which could be noun phrases, verb phrases, adjective phrases, or clauses. [5]

ConceptNet is created from the natural-language assertions in OMCS by matching them against patterns using a shallow parser. Assertions are expressed as relations between two concepts, selected from a limited set of possible relations. The various relations represent common sentence patterns found in the OMCS corpus, and in particular, every "fill-in-the-blanks" template used on the knowledge-collection Web site is associated with a particular relation. [5]

The data structures that make up ConceptNet were significantly reorganized in 2007, and published as ConceptNet 3. [5] The Software Agents group currently distributes a database and API for the new version 4.0. [6]

In 2010, OMCS co-founder and director Catherine Havasi, with Robyn Speer, Dennis Clark and Jason Alonso, created Luminoso, a text analytics software company that builds on ConceptNet. [7] [8] [9] [10] It uses ConceptNet as its primary lexical resource in order to help businesses make sense of and derive insight from vast amounts of qualitative data, including surveys, product reviews and social media. [7] [11] [12]

Machine learning tools

The information in ConceptNet can be used as a basis for machine learning algorithms. One representation, called AnalogySpace, uses singular value decomposition to generalize and represent patterns in the knowledge in ConceptNet, in a way that can be used in AI applications. Its creators distribute a Python machine learning toolkit called Divisi [13] for performing machine learning based on text corpora, structured knowledge bases such as ConceptNet, and combinations of the two.

Comparison to other projects

Other similar projects include Never-Ending Language Learning, Mindpixel (discontinued), Cyc, Learner, SenticNet, Freebase, YAGO, DBpedia, and Open Mind 1001 Questions, which have explored alternative approaches to collecting knowledge and providing incentive for participation.

The Open Mind Common Sense project differs from Cyc because it has focused on representing the common sense knowledge it collected as English sentences, rather than using a formal logical structure. ConceptNet is described by one of its creators, Hugo Liu, as being structured more like WordNet than Cyc, due to its "emphasis on informal conceptual-connectedness over formal linguistic-rigor". [14]

There is also the Brazilian initiative, named Open Mind Common Sense in Brazil (OMCS-Br), led by the Advanced Interaction Lab at Federal University of São Carlos (LIA-UFSCar). This project started in 2005, in collaboration with the Software Agents Group at the MIT Media Lab, the main goal is to collect common sense stated in Brazilian Portuguese and use it to develop culturally sensitive software applications based on extracting cultural profiles' knowledge from ConceptNet. This is intended to help developers and users with a culturally contextualized content software, making the final applications more flexible, adaptive, accessible and usable. The main applications' focuses are education and healthcare.[ citation needed ]

See also

Related Research Articles

<span class="mw-page-title-main">Cyc</span>

Cyc is a long-term artificial intelligence project that aims to assemble a comprehensive ontology and knowledge base that spans the basic concepts and rules about how the world works. Hoping to capture common sense knowledge, Cyc focuses on implicit knowledge that other AI platforms may take for granted. This is contrasted with facts one might find somewhere on the internet or retrieve via a search engine or Wikipedia. Cyc enables semantic reasoners to perform human-like reasoning and be less "brittle" when confronted with novel situations.

Knowledge representation and reasoning is the field of artificial intelligence (AI) dedicated to representing information about the world in a form that a computer system can use to solve complex tasks such as diagnosing a medical condition or having a dialog in a natural language. Knowledge representation incorporates findings from psychology about how humans solve problems and represent knowledge in order to design formalisms that will make complex systems easier to design and build. Knowledge representation and reasoning also incorporates findings from logic to automate various kinds of reasoning, such as the application of rules or the relations of sets and subsets.

Semantics is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and computer science.

<span class="mw-page-title-main">Semantic network</span> Knowledge base that represents semantic relations between concepts in a network

A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, which represent concepts, and edges, which represent semantic relations between concepts, mapping or connecting semantic fields. A semantic network may be instantiated as, for example, a graph database or a concept map. Typical standardized semantic networks are expressed as semantic triples.

<span class="mw-page-title-main">WordNet</span> Computational lexicon of English

WordNet is a lexical database of semantic relations between words in more than 200 languages. WordNet links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into synsets with short definitions and usage examples. WordNet can thus be seen as a combination and extension of a dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. WordNet was first created in the English language and the English WordNet database and software tools have been released under a BSD style license and are freely available for download from that WordNet website.

In computer science and information science, an ontology encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of concepts and categories that represent the subject.

Mindpixel was a web-based collaborative artificial intelligence project which aimed to create a knowledgebase of millions of human validated true/false statements, or probabilistic propositions. It ran from 2000 to 2005.

In artificial intelligence (AI), commonsense reasoning is a human-like ability to make presumptions about the type and essence of ordinary situations humans encounter every day. These assumptions include judgments about the nature of physical objects, taxonomic properties, and peoples' intentions. A device that exhibits commonsense reasoning might be capable of drawing conclusions that are similar to humans' folk psychology and naive physics.

Social computing is an area of computer science that is concerned with the intersection of social behavior and computational systems. It is based on creating or recreating social conventions and social contexts through the use of software and technology. Thus, blogs, email, instant messaging, social network services, wikis, social bookmarking and other instances of what is often called social software illustrate ideas from social computing.

In artificial intelligence research, commonsense knowledge consists of facts about the everyday world, such as "Lemons are sour", that all humans are expected to know. It is currently an unsolved problem in Artificial General Intelligence. The first AI program to address common sense knowledge was Advice Taker in 1959 by John McCarthy.

In information science, an upper ontology is an ontology which consists of very general terms that are common across all domains. An important function of an upper ontology is to support broad semantic interoperability among a large number of domain-specific ontologies by providing a common starting point for the formulation of definitions. Terms in the domain ontology are ranked under the terms in the upper ontology, e.g., the upper ontology classes are superclasses or supersets of all the classes in the domain ontologies.

Knowledge collection from volunteer vontributors (KCVC) is a subfield of knowledge acquisition within artificial intelligence which attempts to drive down the cost of acquiring the knowledge required to support automated reasoning by having the public enter knowledge in computer processable form over the internet. KCVC might be regarded as similar in spirit to Wikipedia, although the intended audience, artificial Intelligence systems, differs.

A semantic reasoner, reasoning engine, rules engine, or simply a reasoner, is a piece of software able to infer logical consequences from a set of asserted facts or axioms. The notion of a semantic reasoner generalizes that of an inference engine, by providing a richer set of mechanisms to work with. The inference rules are commonly specified by means of an ontology language, and often a description logic language. Many reasoners use first-order predicate logic to perform reasoning; inference commonly proceeds by forward chaining and backward chaining. There are also examples of probabilistic reasoners, including non-axiomatic reasoning systems, and probabilistic logic networks.

<span class="mw-page-title-main">Ontology engineering</span> Field which studies the methods and methodologies for building ontologies

In computer science, information science and systems engineering, ontology engineering is a field which studies the methods and methodologies for building ontologies, which encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts, data and entities. In a broader sense, this field also includes a knowledge construction of the domain using formal ontology representations such as OWL/RDF. A large-scale representation of abstract concepts such as actions, time, physical objects and beliefs would be an example of ontological engineering. Ontology engineering is one of the areas of applied ontology, and can be seen as an application of philosophical ontology. Core ideas and objectives of ontology engineering are also central in conceptual modeling.

<span class="mw-page-title-main">YAGO (database)</span> Open-source information repository

YAGO is an open source knowledge base developed at the Max Planck Institute for Computer Science in Saarbrücken. It is automatically extracted from Wikipedia and other sources.

MontyLingua is a popular natural language processing toolkit. It is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for both the Python and Java programming languages. It is enriched with common sense knowledge about the everyday world from Open Mind Common Sense. From English sentences, it extracts subject/verb/object tuples, extracts adjectives, noun phrases and verb phrases, and extracts people's names, places, events, dates and times, and other semantic information. It does not require training. It was written by Hugo Liu at MIT in 2003.
Because it is enriched with common sense knowledge it can avoid many mistakes. e.g.:


LIA – Advanced Interaction Laboratory was founded in 2003 as a Human-Computer Interaction (HCI) research lab in the Department of Computer Science at UFSCar – Federal University of São Carlos. LIA's mission is to research innovative Information and Communication Technologies (ICTs) approaches for designing, developing and using ICT aiming contributing to overcome the challenges faced in the adoption of ICT considering social, professional, economical, political and cultural context of use.

Luminoso is a Cambridge, MA-based text analytics and artificial intelligence company. It spun out of the MIT Media Lab and its crowd-sourced Open Mind Common Sense (OMCS) project.

<span class="mw-page-title-main">Catherine Havasi</span> American AI scientist

Catherine Havasi is an American scientist who specialises in artificial intelligence (AI) at MIT Media Lab. She is co-founder and CEO of AI company Luminoso. Havasi was a member of the MIT group engaged in the Open Mind Common Sense AI project and that created the natural language AI program ConceptNet.

References

  1. 1 2 Robyn Speer, Catherine Havasi, and Henry Lieberman. AnalogySpace: Reducing the Dimensionality of Common Sense Knowledge Archived 2010-07-09 at the Wayback Machine . AAAI 2008.
  2. Catherine Havasi. Discovering Semantic Relations Using Singular Value Decomposition Based Techniques. Ph.D Thesis, Brandeis University June 2009.
  3. MIT News Office (2006-03-08). "Memorial service slated tomorrow for Pushpinder Singh". MIT Tech Talk. Retrieved 2009-10-07.
  4. "Profile for verbosity". Open Mind Commons Sense. Archived from the original on 2010-06-25.
  5. 1 2 3 Catherine Havasi, Robyn Speer and Jason Alonso. ConceptNet 3: a Flexible, Multilingual Semantic Network for Common Sense Knowledge. Proceedings of Recent Advances in Natural Language Processing, 2007. try ConceptNet 3:... Archived 2015-05-29 at the Wayback Machine
  6. Commonsense Computing Initiative (2009-02-24). "ConceptNet API in Launchpad" . Retrieved 2009-10-07.
  7. 1 2 Lohr, Steve (27 June 2014). "The U.S.-Germany Match Through a Social Media Lens". New York Times. Retrieved 3 March 2015.
  8. Rusli, Evelyn (14 April 2014). "Firms Use Artificial Intelligence to Tap Shoppers' Views". The Wall Street Journal. Retrieved 3 March 2015.
  9. Alba, Davey (12 February 2015). "The Startup That Helps You Analyze Twitter Chatter in Real Time". Wired. Retrieved 3 March 2015.
  10. Noyes, Katherine (11 February 2015). "Luminoso to enterprises: Here's what all that chatter really means". PC World. Retrieved 3 March 2015.
  11. Miller, Ron (2 July 2014). "Luminoso Lands $6.5M In Series A To Keep Building Cloud Text Analytics Service". TechCrunch. Retrieved 3 March 2015.
  12. Darrow, Barb (11 February 2015). "Luminoso brings its text analysis smarts to streaming data". GigaOm. Retrieved 3 March 2015.
  13. Commonsense Computing Initiative (2009-02-24). "Divisi in Launchpad" . Retrieved 2009-10-07.
  14. "The ConceptNet Project V2.1" . Retrieved 2008-12-17.