Google Knowledge Graph

Last updated

Knowledge panel data about Thomas Jefferson displayed on Google Search, as of January 2015 Google Knowledge Panel.png
Knowledge panel data about Thomas Jefferson displayed on Google Search, as of January 2015

The Google Knowledge Graph is a knowledge base from which Google serves relevant information in an infobox beside its search results. This allows the user to see the answer in a glance, as an instant answer. The data is generated automatically from a variety of sources, covering places, people, businesses, and more. [1] [2]

Contents

The information covered by Google's Knowledge Graph grew quickly after launch, tripling its data size within seven months (covering 570 million entities and 18 billion facts [3] ). By mid-2016, Google reported that it held 70 billion facts [4] and answered "roughly one-third" of the 100 billion monthly searches they handled. By May 2020, this had grown to 500 billion facts on 5 billion entities. [5]

There is no official documentation of how the Google Knowledge Graph is implemented. [6] According to Google, its information is retrieved from many sources, including the CIA World Factbook and Wikipedia. [7] It is used to answer direct spoken questions in Google Assistant [8] [9] and Google Home voice queries. [10] It has been criticized for providing answers with neither source attribution nor citations. [11]

History

Google announced its Knowledge Graph on May 16, 2012, as a way to significantly enhance the value of information returned by Google searches. [7] Initially available only in English, it was expanded in December 2012 to Spanish, French, German, Portuguese, Japanese, Russian and Italian. [12] Bengali support was added in March 2017. [13]

The Knowledge Graph was powered in part by Freebase. [7]

In August 2014, New Scientist reported that Google had launched a Knowledge Vault project. [14] After publication, Google reached out to Search Engine Land to explain that Knowledge Vault was a research report, not an active Google service. Search Engine Land expressed indications that Google was experimenting with "numerous models" for gathering meaning from text. [15]

Google's Knowledge Vault was meant to deal with facts, automatically gathering and merging information from across the Internet into a knowledge base capable of answering direct questions, such as "Where was Madonna born?" In a 2014 report, the Vault was reported to have collected over 1.6 billion facts, 271 million of which were considered "confident facts" deemed to be more than 90% true. It was reported to be different from the Knowledge Graph in that it gathered information automatically instead of relying on crowd-sourced facts compiled by humans. [15]

Criticism

Lack of source attribution

By May 2016, knowledge boxes were appearing for "roughly one-third" of the 100 billion monthly searches the company processed. [11] Dario Taraborelli, head of research at the Wikimedia Foundation, told The Washington Post that Google's omission of sources in its knowledge boxes "undermines people’s ability to verify information and, ultimately, to develop well-informed opinions". The publication also reported that the boxes are "frequently unattributed", such as a knowledge box on the age of actress Betty White, which is "as unsourced and absolute as if handed down by God". [11]

Declining Wikipedia article readership

According to The Register in 2014 the display of direct answers in knowledge panels alongside Google search results caused significant readership declines for Wikipedia, from which the panels obtained some of their information. [16] Also in 2014, The Daily Dot noted that "Wikipedia still has no real competitor as far as actual content is concerned. All that's up for grabs are traffic stats. And as a nonprofit, traffic numbers don't equate into revenue in the same way they do for a commercial media site". After the article's publication, a spokesperson for the Wikimedia Foundation, which operates Wikipedia, stated that it "welcomes" the knowledge panel functionality, that it was "looking into" the traffic drops, and that "We've also not noticed a significant drop in search engine referrals. We also have a continuing dialog with staff from Google working on the Knowledge Panel". [17]

In his 2020 book, Dariusz Jemielniak noted that as most Google users do not realize that many answers to their questions that appear in the Knowledge Graph come from Wikipedia, this reduces Wikipedia's popularity, and in turn limited the site's ability to raise new funds and attract new volunteers. [18]

Bias

The algorithm has been criticized for presenting biased or inaccurate information, usually because of sourcing information from websites with high search engine optimization. It had been noted in 2014 that while there was a Knowledge Graph for most major historical or pseudo-historical religious figures such as Moses, Muhammad and Gautama Buddha, there was none for Jesus, the central figure of Christianity. [19] [20] On June 3, 2021, a knowledge box identified Kannada as the ugliest language in India, prompting outrage from the Kannada-language community; the state of Karnataka, where most Kannada speakers live, also threatened to sue Google for damaging the public image of the language. Google promptly changed the featured snippet for the search query and issued a formal apology. [21] [22]

See also

Related Research Articles

<span class="mw-page-title-main">Google Search</span> Search engine from Google

Google Search is a search engine operated by Google. It allows users to search for information on the Internet by entering keywords or phrases. Google Search uses algorithms to analyze and rank websites based on their relevance to the search query. It is the most popular search engine worldwide.

A search engine results page (SERP) is a webpage that is displayed by a search engine in response to a query by a user. The main component of a SERP is the listing of results that are returned by the search engine in response to a keyword query.

Ontotext is a software company with offices in Europe and USA. It is the semantic technology branch of Sirma Group. Its main domain of activity is the development of software based on the Semantic Web languages and standards, in particular RDF, OWL and SPARQL. Ontotext is best known for the Ontotext GraphDB semantic graph database engine. Another major business line is the development of enterprise knowledge management and analytics systems that involve big knowledge graphs. Those systems are developed on top of the Ontotext Platform that builds on top of GraphDB capabilities for text mining using big knowledge graphs.

<span class="mw-page-title-main">DBpedia</span> Online database project

DBpedia is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web using OpenLink Virtuoso. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets.

<span class="mw-page-title-main">Wikimedia Foundation</span> American charitable organization

The Wikimedia Foundation, Inc., abbreviated WMF, is an American 501(c)(3) nonprofit organization headquartered in San Francisco, California, and registered there as a charitable foundation. It is best known as the host of Wikipedia, the seventh most visited website in the world. However, the foundation also hosts 14 other related content projects. It also supports the development of MediaWiki, the wiki software that underpins them all.

Freebase was a large collaborative knowledge base consisting of data composed mainly by its community members. It was an online collection of structured data harvested from many sources, including individual, user-submitted wiki contributions. Freebase aimed to create a global resource that allowed people to access common information more effectively. It was developed by the American software company Metaweb and run publicly beginning in March 2007. Metaweb was acquired by Google in a private sale announced on 16 July 2010. Google's Knowledge Graph is powered in part by Freebase.

Evi is a technology company in Cambridge, England, founded by William Tunstall-Pedoe, which specialises in knowledge base and semantic search engine software. Its first product was an answer engine that aimed to directly answer questions on any subject posed in plain English text, which is accomplished using a database of discrete facts. The True Knowledge Answer engine was launched for private beta testing and development on 7 November 2007.

The following outline is provided as an overview of and topical guide to knowledge:

<span class="mw-page-title-main">DuckDuckGo</span> American software company and Internet search engine

DuckDuckGo is an American software company that offers a number of software products oriented towards helping people protect their privacy online. The company also provides a private search engine, a tracker-blocking browser extension, email protection, and app tracking protection.

<span class="mw-page-title-main">Wikidata</span> Free knowledge database project

Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, is able to use under the CC0 public domain license. Wikidata is a wiki powered by the software MediaWiki, including its extension for semi-structured data, the Wikibase.

<span class="mw-page-title-main">Facebook Graph Search</span> Semantic search engine by Facebook

Facebook Graph Search was a semantic search engine that Facebook introduced in March 2013. It was designed to give answers to user natural language queries rather than a list of links. The name refers to the social graph nature of Facebook, which maps the relationships among users. The Graph Search feature combined the big data acquired from its over one billion users and external data into a search engine providing user-specific search results. In a presentation headed by Facebook CEO Mark Zuckerberg, it was announced that the Graph Search algorithm finds information from within a user's network of friends. Microsoft's Bing search engine provided additional results. In July it was made available to all users using the U.S. English version of Facebook. After being made less publicly visible starting December 2014, the original Graph Search was almost entirely deprecated in June 2019.

<span class="mw-page-title-main">Entity linking</span> Concept in Natural Language Processing

In natural language processing, entity linking, also referred to as named-entity linking (NEL), named-entity disambiguation (NED), named-entity recognition and disambiguation (NERD) or named-entity normalization (NEN) is the task of assigning a unique identity to entities mentioned in text. For example, given the sentence "Paris is the capital of France", the idea is to determine that "Paris" refers to the city of Paris and not to Paris Hilton or any other entity that could be referred to as "Paris". Entity linking is different from named-entity recognition (NER) in that NER identifies the occurrence of a named entity in text but it does not identify which specific entity it is.

<span class="mw-page-title-main">Dariusz Jemielniak</span> Polish management academic (born 1975)

Dariusz Jemielniak is a professor of management at Kozminski University, faculty associate at the Berkman Klein Center for Internet & Society at Harvard University, and vice-president of Polish Academy of Sciences.

<span class="mw-page-title-main">Timeline of web search engines</span>

This page provides a full timeline of web search engines, starting from the WHOis in 1982, the Archie search engine in 1990, and subsequent developments in the field. It is complementary to the history of web search engines page that provides more qualitative detail on the history.

<span class="mw-page-title-main">Knowledge Engine (search engine)</span> Search engine project

Knowledge Engine (KE) was a search engine project initiated in 2015 by the Wikimedia Foundation (WMF) to locate and display verifiable and trustworthy information from public-information sources in a way that was less reliant on traditional search engines. It aimed to allow readers to stay on Wikipedia.org and other Wikipedia-related projects when looking for additional information rather than returning to proprietary search engines. Its goal was to protect user privacy, to be open and transparent about how a piece of information originates, and to allow access to related metadata.

<span class="mw-page-title-main">Relationship between Google and Wikipedia</span>

The relationship between Google and Wikipedia was originally collaborative in Wikipedia's early days, when Google helped reduce the pagerank of widespread, uneditable Wikipedia clones that were ostensibly ad farms. In 2007, Google introduced Knol, a direct competitor for community-driven encyclopedia creation, which was subsequently shut down in 2012. Google later supported Wikimedia with numerous grants, and came to rely on Wikipedia for solving the problem of spreading misinformation on YouTube, providing verifiable and well-sourced information to those seeking it. Google and Wikimedia Enterprise started a partnership in 2021.

<span class="mw-page-title-main">Knowledge graph</span> Type of knowledge base

In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the semantics or relationships underlying these entities.

Datacommons.org is an open knowledge graph hosted by Google that provides a unified view across multiple public datasets, combining economic, scientific and other open datasets into an integrated data graph. The Datacommons.org site was launched in May 2018 with an initial dataset consisting of fact-checking data published in Schema.org "ClaimReview" format by several fact checkers from the International Fact-Checking Network. Google has worked with partners including the United States Census, the World Bank, and US Bureau of Labor Statistics to populate the repository, which also hosts data from Wikipedia, the National Oceanic and Atmospheric Administration and the Federal Bureau of Investigation. The service expanded during 2019 to include an RDF-style Knowledge Graph populated from a number of largely statistical open datasets. The service was announced to a wider audience in 2019. In 2020 the service improved its coverage of non-US datasets, while also increasing its coverage of bioinformatics and coronavirus.

<span class="mw-page-title-main">Wikipedia and fact-checking</span> Culture and practice of fact-checking in Wikipedia

Wikipedia's volunteer editor community has the responsibility of fact-checking Wikipedia's content. Their aim is to curb the dissemination of misinformation and disinformation by the website.

References

  1. "About knowledge panels - Knowledge Panel Help". Google Support. Retrieved March 15, 2021.
  2. "Your business information in the Knowledge Panel". Google My Business Help. Google Inc. Retrieved December 10, 2017.
  3. Newton, Casey (December 4, 2012). "Google's Knowledge Graph tripled in size in seven months". CNET . CBS Interactive . Retrieved December 10, 2017.
  4. Vincent, James (October 4, 2016). "Apple boasts about sales; Google boasts about how good its AI is". The Verge . Vox Media . Retrieved December 10, 2017.
  5. "A reintroduction to our Knowledge Graph and knowledge panels". Google blog. May 20, 2020. Retrieved May 26, 2020. It's a system that understands facts and information about entities from materials shared across the web, as well as from open source and licensed databases. It has amassed over 500 billion facts about five billion entities.
  6. Ehrlinger, Lisa; Wöß, Wolfram (2016). "Towards a Definition of Knowledge Graphs" (PDF).
  7. 1 2 3 Singhal, Amit (May 16, 2012). "Introducing the Knowledge Graph: Things, Not Strings". Google Official Blog. Retrieved September 6, 2014.
  8. Lynley, Matthew (May 18, 2016). "Google unveils Google Assistant, a virtual assistant that's a big upgrade to Google Now". TechCrunch . Oath Inc. Retrieved December 10, 2017.
  9. Kovach, Steve (October 4, 2016). "Google is going to win the next major battle in computing". Business Insider . Axel Springer SE . Retrieved December 10, 2017.
  10. Bohn, Dieter (May 18, 2016). "Google Home: a speaker to finally take on the Amazon Echo". The Verge . Vox Media . Retrieved December 10, 2017.
  11. 1 2 3 Dewey, Caitlin (May 11, 2016). "You Probably Haven't Even Noticed Google's Sketchy Quest to Control the World's Knowledge". The Washington Post . Archived from the original on June 3, 2016. Retrieved May 25, 2022.
  12. Newton, Casey (December 14, 2012). "How Google is taking the Knowledge Graph global". CNET . CBS Interactive . Retrieved December 10, 2017.
  13. "Making it easier to Search in Bengali". Official Google India Blog. Retrieved January 26, 2018.
  14. Hodson, Hal (August 20, 2014). "Google's fact-checking bots build vast knowledge bank". New Scientist . Retrieved December 10, 2017.
  15. 1 2 Sterling, Greg (August 25, 2014). "Google "Knowledge Vault" To Power Future Of Search". Search Engine Land . Retrieved December 10, 2017.
  16. Orlowski, Andrew (January 13, 2014). "Google stabs Wikipedia in the front". The Register . Retrieved December 10, 2017.
  17. Kloc, Joe (January 8, 2014). "Is Google accidentally killing Wikipedia?". The Daily Dot . Retrieved December 10, 2017.
  18. Jemielniak, Dariusz; Przegalinska, Aleksandra (February 18, 2020). Collaborative Society. MIT Press. ISBN   978-0-262-35645-9.
  19. Schwartz, Barry (July 8, 2014). "Why Does Google Exclude Jesus Christ From The Knowledge Graph". Search Engine Roundtable. Retrieved May 29, 2016.
  20. Wolford, Josh (July 8, 2014). "Google Has a Jesus-Shaped Hole in Its Graph". WebProNews. Retrieved May 29, 2016.
  21. "Why Google showed Kannada as 'ugliest language of India': Explained". Hindustan Times. June 4, 2021.
  22. Ives, Mike; Mozur, Paul (June 4, 2021). "India's 'Ugliest' Language? Google Had an Answer (and Drew a Backlash)". The New York Times.