Wikidata

Last updated

Contents

Wikidata
Wikidata-logo-en.svg
Screenshot
Wikidata main page screenshot.png
Main page of Wikidata in April 2021
Type of site
Available inMultiple languages
Owner Wikimedia Foundation
EditorWikimedia community
URL www.wikidata.org OOjs UI icon edit-ltr-progressive.svg
CommercialNo
RegistrationOptional
Launched29 October 2012;9 years ago (2012-10-29) [1]

Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. [2] It is a common source of open data that Wikimedia projects such as Wikipedia, [3] [4] and anyone else, can use under the CC0 public domain license. Wikidata is a wiki powered by the software MediaWiki, and is also powered by the set of knowledge graph MediaWiki extensions known as Wikibase.

Concept

This diagram shows the most important terms used in Wikidata. Datamodel in Wikidata.svg
This diagram shows the most important terms used in Wikidata.

Wikidata is a document-oriented database, focused on items, which represent any kind of topic, concept, or object. Each item is allocated a unique, persistent identifier, a positive integer prefixed with the upper-case letter Q, known as a "QID". This enables the basic information required to identify the topic that the item covers to be translated without favouring any language.

Examples of items include 1988 Summer Olympics (Q8470), love (Q316), Johnny Cash (Q42775), Elvis Presley (Q303), and Gorilla (Q36611).

Item labels need not be unique. For example, there are two items named "Elvis Presley": Elvis Presley (Q303), which represents the American singer and actor, and Elvis Presley (Q610926), which represents his self-titled album. However, the combination of a label and its description must be unique. To avoid ambiguity, an item's unique identifier (QID) is therefore linked to this combination.

Item types are general and lexemes.[ clarification needed ]

Main parts

A layout of the four main components of a phase-1 Wikidata page: the label, description, aliases, and interlanguage links. Wikidata layout Phase I.png


A layout of the four main components of a phase-1 Wikidata page: the label, description, aliases, and interlanguage links.

Fundamentally, an item consists of:

Statements

Three statements from Wikidata's item on the planet Mars (Q111). Values include links to other items and to Wikimedia Commons. Wikidata statements Mars.png
Three statements from Wikidata's item on the planet Mars (Q111). Values include links to other items and to Wikimedia Commons.

Statements are how any information known about an item is recorded in Wikidata. Formally, they consist of key–value pairs, which match a property (such as "author", or "publication date") with one or more entity values (such as "Sir Arthur Conan Doyle" or "1902"). For example, the informal English statement "milk is white" would be encoded by a statement pairing the property color (P462) with the value white (Q23444) under the item milk (Q8495).

Statements may map a property to more than one value. For example, the "occupation" property for Marie Curie could be linked with the values "physicist" and "chemist", to reflect the fact that she engaged in both occupations. [5]

Values may take on many types including other Wikidata items, strings, numbers, or media files. Properties prescribe what types of values they may be paired with. For example, the property official website (P856) may only be paired with values of type "URL". [6]

Property and value

Example of a simple statement consisting of one property-value pair Wikidata - simple statement.svg
Example of a simple statement consisting of one property–value pair

Wikidata's method of structuring data involves two main elements: properties and values of said properties (termed "items" in Wikidata's terminology). [7] [8]

A property describes the data value of a statement and can be thought of as a category of data, for example, color (P462) for the data value blue (Q1088) or education for a person item.

As said, properties, when paired with values, form a statement in Wikidata.

The most used property is cites work (P2860), which is used on more than 210,000,000 item pages. [9]

Properties have their own pages on Wikidata and as an item can include several properties, this results in a linked data structure of pages, under the same statement.

Properties may also define more complex rules about their intended usage, termed constraints. For example, the capital (P36) property includes a "single value constraint", reflecting the reality that (typically) territories have only one capital city. Constraints are treated as testing alerts and hints, rather than inviolable rules. [10]

Optionally, qualifiers can be used to refine the meaning of a statement by providing additional information that applies to the scope of the statement, within the values. For example, the property "population" could be modified with a qualifier such as "as of 2011". Values in the statements may also be annotated with references, pointing to a source backing up the statement's content. [11]

Lexemes

In linguistics, a lexeme is a unit of lexical meaning. Similarly, Wikidata's lexemes are items with a structure that makes them more suitable to store lexicographical data. Besides storing the language to which the lexeme refers, they have a section for forms and a section for senses. [12]

EntitySchemas

In January 2019 development started of a new extension for MediaWiki to enable storing Shape Expressions in a separate namespace. [13] [14]

This extension has since been installed on Wikidata [15] and enable contributors to use Shape Expressions for validating and describing Resource Description Framework data in items and lexemes. Any item or lexeme on Wikidata can be validated against an Entity Schema, and this makes it an important tool for quality assurance.

Development

The creation of the project was funded by donations from the Allen Institute for Artificial Intelligence, the Gordon and Betty Moore Foundation, and Google, Inc., totaling 1.3 million. [16] [17] The development of the project is mainly driven by Wikimedia Deutschland under the management of Lydia Pintscher, and was originally split into three phases: [18]

  1. Centralising interlanguage links – links between Wikipedia articles about the same topic in different languages.
  2. Providing a central place for infobox data for all Wikipedias.
  3. Creating and updating list articles based on data in Wikidata and linking to other Wikimedia sister projects, including Meta-Wiki and the own Wikidata (interwikilinks).

Initial rollout

A Wikipedia article's list of interlanguage links as they appeared in an edit box (left) and on the article's page (right) prior to Wikidata. Each link in these lists is to an article that requires its own list of interlanguage links to the other articles; this is the information centralized by Wikidata. Interlanguage links prior to Wikidata.png


A Wikipedia article's list of interlanguage links as they appeared in an edit box (left) and on the article's page (right) prior to Wikidata. Each link in these lists is to an article that requires its own list of interlanguage links to the other articles; this is the information centralized by Wikidata.
The "Edit links" link nowadays takes the reader to Wikidata to edit interlanguage and interwiki links. Interlanguage links provided by WikiData.png
The "Edit links" link nowadays takes the reader to Wikidata to edit interlanguage and interwiki links.

Wikidata was launched on 29 October 2012 and was the first new project of the Wikimedia Foundation since 2006. [3] [19] [20] At this time, only the centralization of language links was available. This enabled items to be created and filled with basic information: a label – a name or title, aliases – alternative terms for the label, a description, and links to articles about the topic in all the various language editions of Wikipedia (interwikipedia links).

Historically, a Wikipedia article would include a list of interlanguage links, being links to articles on the same topic in other editions of Wikipedia, if they existed. Initially, Wikidata was a self-contained repository of interlanguage links. [21] Wikipedia language editions were still not able to access Wikidata, so they needed to continue to maintain their own lists of interlanguage links, mainly at the end of the articles' pages.[ citation needed ]

On 14 January 2013, the Hungarian Wikipedia became the first to enable the provision of interlanguage links via Wikidata. [22] This functionality was extended to the Hebrew and Italian Wikipedias on 30 January, to the English Wikipedia on 13 February and to all other Wikipedias on 6 March. [23] [24] [25] [26] After no consensus was reached over a proposal to restrict the removal of language links from the English Wikipedia, [27] the power to delete them from the English Wikipedia was granted to automatic editors (bots). On 23 September 2013, interlanguage links went live on Wikimedia Commons. [28]

Statements and data access

On 4 February 2013, statements were introduced to Wikidata entries. The possible values for properties were initially limited to two data types (items and images on Wikimedia Commons), with more data types (such as coordinates and dates) to follow later. The first new type, string, was deployed on 6 March. [29]

The ability for the various language editions of Wikipedia to access data from Wikidata was rolled out progressively between 27 March and 25 April 2013. [30] [31] On 16 September 2015, Wikidata began allowing so-called arbitrary access, or access from a given article of a Wikipedia to the statements on Wikidata items not directly connected to it. For example, it became possible to read data about Germany from the Berlin article, which was not feasible before. [32] On 27 April 2016 arbitrary access was activated on Wikimedia Commons. [33]

According to a 2020 study, a large proportion of the data on Wikidata consists of entries imported en masse from other databases by Internet bots, which helps to "break down the walls" of data silos. [34]

Query service and other improvements

On 7 September 2015, the Wikimedia Foundation announced the release of the Wikidata Query Service, [35] which lets users run queries on the data contained in Wikidata. [36] The service uses SPARQL as the query language. As of November 2018, there are at least 26 different tools that allow querying the data in different ways. [37]

On the other hand, in the Wiktionary lateral pane, the tools now include[ when? ] a "Wikidata item" to help create a new item and links to new pages.[ citation needed ] For example, this is useful when the item is only in the English Wiktionary and needs to be linked to another Wikimedia project, rather than to Wiktionaries in other languages.

Below is a SPARQL example to search an instance of (P31) television series (Q5398426) with the main subject (P921) about island (Q23442) and aviation accident (Q744913). However similar results can also be found directly on Wikipedia using category intersections if the appropriate categories exist and are allowed.

SELECT?item?itemLabelWHERE{?itemwdt:P31wd:Q5398426.?itemwdt:P921wd:Q23442.?itemwdt:P921wd:Q744913.SERVICEwikibase:label{bd:serviceParamwikibase:language"[AUTO_LANGUAGE],en".}}

Below is another SPARQL example to find an instance of (P31) television series (Q5398426) where cast member (P161) includes Daniel Dae Kim (Q299700) and Jorge Garcia (Q264914). The television series condition prevents displaying a television series episode (Q21191270) / two-part episode (Q21664088) and does not show results that are a film (Q11424).

SELECT?item?itemLabelWHERE{?itemwdt:P31wd:Q5398426.?itemwdt:P161wd:Q299700.?itemwdt:P161wd:Q264914.SERVICEwikibase:label{bd:serviceParamwikibase:language"[AUTO_LANGUAGE],en".}}

The bars on the logo contain the word "WIKI" encoded in Morse code. [38] It was created by Arun Ganesh and selected through community decision. [39]

Reception

In November 2014, Wikidata received the Open Data Publisher Award from the Open Data Institute "for sheer scale, and built-in openness". [40]

As of November 2018, Wikidata information was used in 58.4% of all English Wikipedia articles, mostly for external identifiers or coordinate locations. In aggregate, data from Wikidata is shown in 64% of all Wikipedias' pages, 93% of all Wikivoyage articles, 34% of all Wikiquotes', 32% of all Wikisources', and 27% of Wikimedia Commons's. Usage in other Wikimedia Foundation projects is a testimonial. [41]

As of December 2020, Wikidata's data was visualized by at least 20 other external tools [42] and over 300 papers have been published about Wikidata. [43]

Wikidata's structured dataset has been used by virtual assistants such as Apple's Siri and Amazon Alexa. [44]

Applications

See also

Related Research Articles

History of Wikipedia Historical development of Wikipedia

Wikipedia began with its first edit on 15 January 2001, two days after the domain was registered by Jimmy Wales and Larry Sanger. Its technological and conceptual underpinnings predate this; the earliest known proposal for an online encyclopedia was made by Rick Gates in 1993, and the concept of a free-as-in-freedom online encyclopedia was proposed by Richard Stallman in 1998.

The Semantic Web is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

The Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a data model for metadata. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax notations and data serialization formats. It is also used in knowledge management applications.

Wiktionary Multilingual online dictionary

Wiktionary is a multilingual, web-based project to create a free content dictionary of terms in all natural languages and in a number of artificial languages. These entries may contain definitions, images for illustrations, pronunciations, etymologies, inflections, usage examples, quotations, related terms, and translations of words into other languages, among other features. It is collaboratively edited via a wiki. Its name is a portmanteau of the words wiki and dictionary. It is available in 182 languages and in Simple English. Like its sister project Wikipedia, Wiktionary is run by the Wikimedia Foundation, and is written collaboratively by volunteers, dubbed "Wiktionarians". Its wiki software, MediaWiki, allows almost anyone with access to the website to create and edit entries.

MediaWiki Free and open-source wiki software, used by Wikipedia

MediaWiki is a free and open-source wiki software. It was developed for use on Wikipedia in 2002, and given the name "MediaWiki" in 2003. It remains in use on Wikipedia and almost all other Wikimedia websites, including Wiktionary, Wikimedia Commons and Wikidata; these sites continue to define a large part of the requirement set for MediaWiki. MediaWiki was originally developed by Magnus Manske and improved by Lee Daniel Crocker. Its development has since then been coordinated by the Wikimedia Foundation.

SPARQL is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web. On 15 January 2008, SPARQL 1.0 was acknowledged by W3C as an official recommendation, and SPARQL 1.1 in March, 2013.

A semantic wiki is a wiki that has an underlying model of the knowledge described in its pages. Regular, or syntactic, wikis have structured text and untyped hyperlinks. Semantic wikis, on the other hand, provide the ability to capture or identify information about the data within pages, and the relationships between pages, in ways that can be queried or exported like a database through semantic queries.

Wikimedia movement Global community of contributors to Wikimedia Foundation projects

According to the Wikimedia Foundation, the Wikimedia movement is the global community of contributors to the Wikimedia projects. This community directly builds and administers the projects. It is committed to using open standards and software.

Hungarian Wikipedia

The Hungarian Wikipedia is the Hungarian/Magyar version of Wikipedia, the free encyclopedia. Started on July 8, 2003, this version reached the 300,000 article milestone in May 2015. As of 19 November 2021 this edition has 494,626 articles and is the 26th largest Wikipedia edition.

Semantic MediaWiki Software for creating, managing and sharing structured data in MediaWiki

Semantic MediaWiki (SMW) is an extension to MediaWiki that allows for annotating semantic data within wiki pages, thus turning a wiki that incorporates the extension into a semantic wiki. Data that has been encoded can be used in semantic searches, used for aggregation of pages, displayed in formats like maps, calendars and graphs, and exported to the outside world via formats like RDF and CSV.

DBpedia Online database project

DBpedia is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets.

History of wikis History of wiki collaborative platforms

The history of wikis begins in 1994, when Ward Cunningham gave the name "WikiWikiWeb" to the knowledge base, which ran on his company's website at c2.com, and the wiki software that powered it. The "wiki went public" in March 1995—the date used in anniversary celebrations of the wiki's origins. c2.com is thus the first true wiki, or a website with pages and links that can be easily edited via the browser, with a reliable version history for each page. He chose "WikiWikiWeb" as the name based on his memories of the "Wiki Wiki Shuttle" at Honolulu International Airport, and because "wiki" is the Hawaiian word for "quick".

Wikimedia Commons Media repository of free-use images, sounds, other media, and JSON files

Wikimedia Commons is a media repository of free-use images, sounds, other media, and JSON files. It is a project of the Wikimedia Foundation.

Freebase is a large collaborative knowledge base consisting of data composed mainly by its community members. It was an online collection of structured data harvested from many sources, including individual, user-submitted wiki contributions. Freebase aimed to create a global resource that allowed people to access common information more effectively. It is developed by the American software company Metaweb and run publicly beginning in March 2007. Metaweb is acquired by Google in a private sale announced on 16 July 2010. Google's Knowledge Graph is powered in part by Freebase.

Outline of Wikipedia Overview of and topical guide to Wikipedia

Wikipedia – a free, web-based, collaborative and multilingual encyclopedia website & project supported by the non-profit Wikimedia Foundation. It has more than 48 million articles written collaboratively by volunteers around the world. Almost all of its articles can be edited by anyone with access to the site, and it has about 100,000 regularly active contributors.

Volapük Wikipedia Volapük-language edition of Wikipedia

The Volapük Wikipedia is the Volapük-language edition of the free online encyclopedia Wikipedia. It was created in February 2003, but launched in January 2004. As of January 2017, it was the 64th-largest Wikipedia as measured by the number of articles, with about 127,000 articles, and the second-largest Wikipedia in a constructed language after the Esperanto Wikipedia.

Wikibase Collection of software (applications and libraries) for creating, managing and sharing structured data

Wikibase is a set of MediaWiki extensions for working with versioned semi-structured data in a central repository based upon JSON instead of the unstructured data of MediaWiki wikitext. Its primary components are the Wikibase Repository, an extension for storing and managing data, and the Wikibase Client which allows for the retrieval and embedding of structured data from a wikibase repository. Wikibase was developed for and is used by Wikidata.

Wikifunctions Wikimedia Foundation project

Wikifunctions is a collaboratively edited catalog of computer functions that aims to allow the creation, modification, and reuse of source code. It is closely related to Abstract Wikipedia, an extension to Wikidata that aims to create a language-independent version of Wikipedia using its structured data. Provisionally named Wikilambda, the definitive name of Wikifunctions was announced on 22 December 2020 following a naming contest. The Wikifunctions website is expected to launch in 2022 and will be the first new Wikimedia project to launch since 2012.

Denny Vrandečić Croatian computer scientist

Zdenko "Denny" Vrandečić,, is a Croatian computer scientist. He was a co-developer of Semantic MediaWiki and Wikidata, the lead developer of the Wikifunctions project, and an employee of the Wikimedia Foundation as a Head of Special Projects, Structured Content.

References

  1. https://blog.wikimedia.org/2013/04/25/the-wikidata-revolution/; retrieved: 14 November 2018; quotation: Since Wikidata.org went live on 30 October 2012,.
  2. Chalabi, Mona (26 April 2013). "Welcome to Wikidata! Now what?" . Retrieved 2 October 2021.
  3. 1 2 Wikidata (Archived October 30, 2012, at WebCite )
  4. "Data Revolution for Wikipedia". Wikimedia Deutschland. 30 March 2012. Archived from the original on 11 September 2012. Retrieved 11 September 2012.
  5. "Help:Statements – Wikidata". www.wikidata.org.
  6. "Help:Data type – Wikidata". www.wikidata.org.
  7. Vrandečić, Denny; Krötzsch, Markus (October 2014). "Wikidata: a free collaborative knowledgebase". Communications of the ACM . 57 (10): 78–85. doi:10.1145/2629489. ISSN   0001-0782. Wikidata   Q18507561.
  8. Turki, Houcemeddine; Shafee, Thomas; Hadj Taieb, Mohamed Ali; Ben Aouicha, Mohamed; Vrandečić, Denny; Das, Diptanshu; Hamdi, Helmi (23 September 2019). "Wikidata: A large-scale collaborative ontological medical database". Journal of Biomedical Informatics . 99: 103292. doi:10.1016/J.JBI.2019.103292. ISSN   1532-0464. PMID   31557529. S2CID   203568040. Wikidata   Q68471881.
  9. "Wikidata:Database reports/List of properties/Top100" . Retrieved 26 March 2021.
  10. "Help:Property constraints portal – Wikidata". www.wikidata.org.
  11. "Help:Sources – Wikidata". www.wikidata.org.
  12. "Wikidata:Lexicographical data/Documentation – Wikidata". www.wikidata.org.
  13. "Extension:EntitySchema - MediaWiki". mediawiki.org. Retrieved 10 September 2021.
  14. Gerrit https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/EntitySchema/+/26654db17345beefbd5518af48ed1bcd17288bc9 . Retrieved 10 September 2021.Missing or empty |title= (help)
  15. "Version - Wikidata". Wikidata.org. Retrieved 10 September 2021.
  16. Dickinson, Boonsri (30 March 2012). "Paul Allen Invests In A Massive Project To Make Wikipedia Better". Business Insider. Retrieved 11 September 2012.
  17. Perez, Sarah (30 March 2012). "Wikipedia's Next Big Thing: Wikidata, A Machine-Readable, User-Editable Database Funded By Google, Paul Allen And Others". TechCrunch. Archived from the original on 11 September 2012. Retrieved 11 September 2012.
  18. "Wikidata – Meta". meta.wikimedia.org.
  19. Pintscher, Lydia (30 October 2012). "wikidata.org is live (with some caveats)". wikidata-l (Mailing list). Retrieved 3 November 2012.
  20. Roth, Matthew (30 March 2012). "The Wikipedia data revolution". Wikimedia Foundation. Archived from the original on 11 September 2012. Retrieved 11 September 2012.
  21. Leitch, Thomas (1 November 2014). Wikipedia U: Knowledge, Authority, and Liberal Education in the Digital Age. Johns Hopkins University Press. p.  120. ISBN   978-1-4214-1550-5.
  22. Pintscher, Lydia (14 January 2013). "First steps of Wikidata in the Hungarian Wikipedia". Wikimedia Deutschland. Retrieved 17 December 2015.
  23. Pintscher, Lydia (30 January 2013). "Wikidata coming to the next two Wikipedias". Wikimedia Deutschland. Retrieved 31 January 2013.
  24. Pintscher, Lydia (13 February 2013). "Wikidata live on the English Wikipedia". Wikimedia Deutschland. Retrieved 15 February 2013.
  25. Pintscher, Lydia (6 March 2013). "Wikidata now live on all Wikipedias". Wikimedia Deutschland. Retrieved 8 March 2013.
  26. "Wikidata ist für alle Wikipedien da" (in German). Golem.de. Retrieved 29 January 2014.
  27. "Wikipedia talk:Wikidata interwiki RFC". 29 March 2013. Retrieved 30 March 2013.
  28. Pintscher, Lydia (23 September 2013). "Wikidata is Here!". Commons:Village pump.
  29. Pintscher, Lydia. "Wikidata/Status updates/2013 03 01". Wikimedia Meta-Wiki. Wikimedia Foundation. Retrieved 3 March 2013.
  30. Pintscher, Lydia (27 March 2013). "You can have all the data!". Wikimedia Deutschland. Retrieved 28 March 2013.
  31. "Wikidata goes live worldwide". The H. 25 April 2013. Archived from the original on 1 January 2014.
  32. Lydia, Pintscher (16 September 2015). "Wikidata: Access to data from arbitrary items is here". Wikipedia:Village pump (technical) . Retrieved 30 August 2016.
  33. Lydia, Pintscher (27 April 2016). "Wikidata support: arbitrary access is here". Commons:Village pump . Retrieved 30 August 2016.
  34. Waagmeester, Andra; Stupp, Gregory; Burgstaller-Muehlbacher, Sebastian; et al. (17 March 2020). "Wikidata as a knowledge graph for the life sciences". eLife . 9. doi:10.7554/ELIFE.52614. ISSN   2050-084X. PMC   7077981 . PMID   32180547. Wikidata   Q87830400.
  35. https://query.wikidata.org/
  36. "[Wikidata] Announcing the release of the Wikidata Query Service - Wikidata - lists.wikimedia.org".
  37. "Wikidata:Tools/Query data – Wikidata". www.wikidata.org.
  38. commons:File talk:Wikidata-logo-en.svg#Hybrid. Retrieved 2016-10-06.
  39. "Und der Gewinner ist..." 13 July 2012.
  40. "First ODI Open Data Awards presented by Sirs Tim Berners-Lee and Nigel Shadbolt". Archived from the original on 24 March 2016.
  41. "Percentage of articles making use of data from Wikidata". Archived from the original on 15 November 2018. Retrieved 15 November 2018.
  42. "Wikidata:Tools/Visualize data – Wikidata". www.wikidata.org.
  43. "Scholia". Scholia.
  44. Simonite, Tom (18 February 2019). "Inside the Alexa-Friendly World of Wikidata". Wired . ISSN   1059-1028 . Retrieved 25 December 2020.
  45. "Rob Barry / Mwnci – Deep Spreadsheets". GitLab.
  46. "Public Review Issues".
  47. "Wiki Explorer in the Google Play Store".
  48. Krause, Volker (12 January 2020), KDE Itinerary – A privacy by design travel assistant , retrieved 10 November 2020

Further reading