Developer(s) | various |
---|---|
Stable release | |
Repository | |
Written in | PHP |
Type | MediaWiki extension |
License | GPL-2.0-or-later |
Website | www |
Semantic MediaWiki (SMW) is an extension to MediaWiki that allows for annotating semantic data within wiki pages, thus turning a wiki that incorporates the extension into a semantic wiki. Data that has been encoded can be used in semantic searches, used for aggregation of pages, displayed in formats like maps, calendars and graphs, and exported to the outside world via formats like RDF and CSV.
Semantic MediaWiki was initially created by Markus Krötzsch, Denny Vrandečić and Max Völkel, and was first released in 2005. Its development was initially funded by the EU-funded FP6 project SEKT (CORDIS site), and was later supported in part by Institute AIFB of the University of Karlsruhe (later renamed the Karlsruhe Institute of Technology). Currently James Hong Kong is the lead developer as of 2017 [update] , while the other core developer is Jeroen De Dauw.
Every semantic annotation within SMW is a "property" connecting the page on which it resides to some other piece of data, either another page or a data value of some type, using triples of the form "subject, predicate, object".
As an example, a page about Germany could have, encoded within it, the fact its capital city is Berlin. On the page "Germany", the syntax would be:
... the capital city is [[Has capital::Berlin]] ...
which is semantically equivalent to the statement "Germany" "Has capital" "Berlin". In this example the "Germany" page is the subject, "Has capital" is the predicate, and "Berlin" is the object that the semantic link is pointing to.
However, the much more common way of storing data within Semantic MediaWiki is via MediaWiki templates which themselves contain the necessary SMW markup.
For this example, the "Germany" page could contain a call to a template called "Country", that looked like this:
{{Country ... | Capital = Berlin ... }}
The "Country" template would handle storing whatever the value of the parameter "Capital" is, using the property "Has capital". The template would also handle the display of the data. Semantic MediaWiki developers have estimated that 99% of SMW data is stored in this way. [2]
Semantic MediaWiki also has its own inline querying tools. For instance, if pages about countries stored additional information like population data, a query could be added to a page that displays a list of all countries with a population greater than 50 million, along with their capital city; and Germany would appear in such a list, with Berlin alongside it. [3]
Semantic MediaWiki is in use on over 1,600 public active wikis around the world, in addition to an unknown number of private wikis. [4] [5] Notable public wikis that use SMW include the Metacafe wiki, Web Platform, SNPedia, SKYbrary, Metavid, Familypedia, OpenEI, [6] the Libreplanet wiki, the Free Software Directory [7] and translatewiki.net. [8]
Organizations that use SMW internally include Pfizer, [9] Harvard Pilgrim Health Care, [10] Johnson & Johnson Pharmaceutical Research and Development, [11] the Pacific Northwest National Laboratory, [12] the Metropolitan Museum of Art, [13] NATO, [14] U.S. Department of Defense, [15] and the International Atomic Energy Agency.
SMW has notably gained traction in the health care domain for collaboratively creating bio-medical terminologies and ontologies. [16] Examples are LexWiki, [17] which is jointly run by the Mayo Clinic, National Cancer Institute, World Health Organization and Stanford University; and Neuroscience Information Framework's NeuroLex.
Semantic MediaWiki used to be supported on the now-defunct wiki farm Referata, [18] by default. [19] Wikia has previously activated Semantic MediaWiki on user request, but has stopped doing so since upgrading to version 1.19 of MediaWiki; Wikia sites, such as Familypedia, that had started using it are able to continue.
Some members of the academic community began urging the use of SMW on Wikipedia since it was first proposed. [20] In a 2006 paper, Max Völkel et al. wrote that in spite of Wikipedia's utility, "its contents are barely machine-interpretable. Structural knowledge, e.g. about how concepts are interrelated, can neither be formally stated nor automatically processed. Also the wealth of numerical data is only available as plain text and thus can not be processed by its actual meaning." [21]
The Wikimedia community began adding semantic microformat markup to Wikipedia [22] in 2007. In 2010, Wikimedia Foundation Deputy Director Erik Möller stated that Wikimedia was interested in adding semantic capabilities to Wikipedia, but that they were unsure whether Semantic MediaWiki was the right solution, since it was unclear whether it could be used without negatively affecting Wikipedia's performance. [23]
In April 2012, the Wikimedia Foundation project Wikidata began, which provides a massive shared database for use in articles of every language in Wikipedia, and other Wikimedia projects. Its content is also freely available to anyone else. [5] Wikidata supplants the potential use of Semantic MediaWiki on Wikipedia, its software uses Wikibase. [24]
A variety of open-source MediaWiki extensions exist that use the data structure provided by Semantic MediaWiki. [25] Among the most notable are:
The official gathering for Semantic MediaWiki developers and users is SMWCon, which has been held twice a year since 2010, in various cities in the United States and Europe. [26] The largest such event, in October 2013 in Berlin, had around 90 attendees. [27] The first virtual SMWCon 2020 attracted 234 attendees. [28]
The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.
The Resource Description Framework (RDF) is a method to describe and exchange graph data. It was originally designed as a data model for metadata by the World Wide Web Consortium (W3C). It provides a variety of syntax notations and data serialization formats, of which the most widely used is Turtle.
MediaWiki is free and open-source wiki software originally developed by Magnus Manske for use on Wikipedia on January 25, 2002, and further improved by Lee Daniel Crocker, after which development has been coordinated by the Wikimedia Foundation. It powers several wiki hosting websites across the Internet, as well as most websites hosted by the Wikimedia Foundation including Wikipedia, Wiktionary, Wikimedia Commons, Wikiquote, Meta-Wiki and Wikidata, which define a large part of the set requirements for the software. Besides its usage on Wikimedia sites, MediaWiki has been used as a knowledge management and content management system on websites such as Fandom, wikiHow and major internal installations like Intellipedia and Diplopedia.
XWiki is a free and Open source wiki software platform written in Java with a design emphasis on extensibility. XWiki is an enterprise wiki. It includes WYSIWYG editing, OpenDocument-based document import/export, annotations and tagging, and advanced permissions management.
Microformats (μF) are a set of defined HTML classes created to serve as consistent and descriptive metadata about an element, designating it as representing a certain type of data. They allow software to process the information reliably by having set classes refer to a specific type of data rather than being arbitrary.
A semantic wiki is a wiki that has an underlying model of the knowledge described in its pages. Regular, or syntactic, wikis have structured text and untyped hyperlinks. Semantic wikis, on the other hand, provide the ability to capture or identify information about the data within pages, and the relationships between pages, in ways that can be queried or exported like a database through semantic queries.
DBpedia is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web using OpenLink Virtuoso. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets.
Semantic file systems are file systems used for information persistence which structure the data according to their semantics and intent, rather than the location as with current file systems. It allows the data to be addressed by their content. Traditional hierarchical file-systems tend to impose a burden, for example when a sub-directory layout is contradicting a user's perception of where files would be stored. Having a tag-based interface alleviates this hierarchy problem and enables users to query for data in an intuitive fashion.
LanguageWare is a natural language processing (NLP) technology developed by IBM, which allows applications to process natural language text. It comprises a set of Java libraries which provide a range of NLP functions: language identification, text segmentation/tokenization, normalization, entity and relationship extraction, and semantic analysis and disambiguation. The analysis engine uses Finite State Machine approach at multiple levels, which aids its performance characteristics, while maintaining a reasonably small footprint.
Business Intelligence 2.0 is a development of the existing business intelligence model that began in the mid-2000s, where data can be obtained from many sources. The process allows for querying real-time corporate data by employees but approaches the data with a web browser-based solution. This is in contrast to previous proprietary querying tools that characterized previous BI software.
Freebase was a large collaborative knowledge base consisting of data composed mainly by its community members. It was an online collection of structured data harvested from many sources, including individual, user-submitted wiki contributions. Freebase aimed to create a global resource that allowed people to access common information more effectively. It was developed by the American software company Metaweb and run publicly beginning in March 2007. Metaweb was acquired by Google in a private sale announced on 16 July 2010. Google's Knowledge Graph is powered in part by Freebase.
NeuroLex is a lexicon of neuroscience concepts supported by the Neuroscience Information Framework project, which is funded by the NIH Blueprint for Neuroscience Research. It is the lexical part of the NIF knowledge base, and NeuroLex is intended to make literature review easier and ensure consistent terminology and usage across researchers for the topics of experimental, clinical, and transnational neuroscience, and for genetic and genomic resources. It is structured as a semantic wiki, using Semantic MediaWiki.
Yahoo! SearchMonkey was a Yahoo! service which allowed developers and site owners to use structured data to make Yahoo! Search results more useful and visually appealing, and drive more relevant traffic to their sites. The service was shut down in October 2010 along with other Yahoo! services as part of the Microsoft and Yahoo! search deal. The name SearchMonkey is an homage to Greasemonkey. Officially the product name has no space and two capital letters.
Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, is able to use under the CC0 public domain license. Wikidata is a wiki powered by the software MediaWiki, including its extension for semi-structured data, the Wikibase. As of mid-2024, Wikidata had 1.57 billion item statements.
translatewiki.net, formerly named Betawiki, is a web-based translation platform powered by the Translate extension for MediaWiki. It can be used to translate various kinds of texts but is commonly used for creating localisations for software interfaces.
An infobox is a digital or physical table used to collect and present a subset of information about its subject, such as a document. It is a structured document containing a set of attribute–value pairs, and in Wikipedia represents a summary of information about the subject of an article. In this way, they are comparable to data tables in some aspects. When presented within the larger document it summarizes, an infobox is often presented in a sidebar format.
Sebastian Schaffert is a software engineer and researcher. He was born in Trostberg, Bavaria, Germany on March 18, 1976 and obtained his doctorate in 2004.
UMBEL is a logically organized knowledge graph of 34,000 concepts and entity types that can be used in information science for relating information from disparate sources to one another. It was retired at the end of 2019. UMBEL was first released in July 2008. Version 1.00 was released in February 2011. Its current release is version 1.50.
Abstract Wikipedia is an in-development project of the Wikimedia Foundation. It aims to use Wikifunctions to create a language-independent version of Wikipedia using its structured data. First conceived in 2020, Abstract Wikipedia has been under active development ever since, with the related project of Wikifunctions launched in 2023. Nevertheless, the project has proved controversial. As envisioned, Abstract Wikipedia would consist of "Constructors", "Content", and "Renderers".
Zdenko "Denny" Vrandečić is a Croatian computer scientist. He was a co-developer of Semantic MediaWiki and Wikidata, the lead developer of the Wikifunctions project, and an employee of the Wikimedia Foundation as a Head of Special Projects, Structured Content. He published modules for the German role-playing game The Dark Eye.