Semantic wiki

Last updated

A semantic wiki is a wiki that has an underlying model of the knowledge described in its pages. Regular, or syntactic, wikis have structured text and untyped hyperlinks. Semantic wikis, on the other hand, provide the ability to capture or identify information about the data within pages, and the relationships between pages, in ways that can be queried or exported like a database [1] [2] through semantic queries.

Contents

Semantic wikis were first proposed in the early 2000s, and began to be implemented seriously around 2005. [3] [4] As of 2021, well-known semantic wiki engines are Semantic MediaWiki and Wikibase. [5]

Key characteristics

Formal notation

The knowledge model found in a semantic wiki is typically available in a formal language, so that machines can process it into an entity-relationship model or relational database.

The formal notation may be included in the pages themselves by users, as in Semantic MediaWiki, or it may be derived from the pages or the page names or the means of linking. For example, using a specific alternative page name might indicate that a specific type of link was intended.

Providing information through a formal notation allows machines to calculate new facts (e.g. relations between pages) from the facts represented in the knowledge model.

Semantic Web compatibility

The technologies developed by the Semantic Web community provide one basis for formal reasoning about the knowledge model that is developed by importing this data. However, there is also a wide array of technologies that work on relational data.

Example

Imagine a semantic wiki devoted to food. The page for an apple would contain, in addition to standard text information, some machine-readable or at least machine-intuitable semantic data. The most basic kind of data would be that an apple is a kind of fruit—what's known as an inheritance relationship. The wiki would thus be able to automatically generate a list of fruits, simply by listing all pages that are tagged as being of type "fruit." Further semantic tags in the "apple" page could indicate other data about apples, including their possible colors and sizes, nutritional information and serving suggestions, and so on.

If the wiki exports all this data in RDF or a similar format, it can then be queried in a similar way to a database—so that an external user or site could, for instance, request a list of all fruits that are red and can also be baked in a pie.

History

In the 1980s, before the Web began, there were several technologies to process typed links between collectively maintained hypertext pages, such as NoteCards, KMS, and gIBIS. Extensive research was published on these tools by the collaboration software, computer-mediated communication, hypertext, and computer supported cooperative work communities.

The first known usage of the term "Semantic Wiki" was a Usenet posting by Andy Dingley in January 2001. [6] Its first known appearance in a technical paper was in a 2003 paper by Austrian researcher Leo Sauermann. [7]

Many of the existing semantic wiki applications were started in the mid-2000s, including ArtificialMemory [8] (2004), Semantic MediaWiki (2005), Freebase (2005), and OntoWiki (2006).

June 2006 saw the first meeting dedicated to semantic wikis, the "SemWiki" workshop, co-located with the European Semantic Web Conference in Montenegro. [9] This workshop ran annually until 2010. [10]

The site DBpedia, launched in 2007, though not a semantic wiki, publishes structured data from Wikipedia in RDF form, which enables semantic querying of Wikipedia's data.

In March 2008, Wikia, the world's largest wiki farm, made the use of Semantic MediaWiki available for all their wikis on request, thus allowing all the wikis they hosted to function as semantic wikis. [11] However, since upgrading to version 1.19 of MediaWiki in 2013, they have stopped supporting Semantic MediaWiki for new requests on the basis of performance problem. [12]

In July 2010, Google purchased Metaweb, the company behind Freebase. [13]

In April 2012, work began on Wikidata, a collaborative, multi-language store of data, whose data could then be used within Wikipedia articles, as well as by the outside world.

Semantic wiki software

There are a number of wiki applications that provide semantic functionality. Some standalone semantic wiki applications exist, including OntoWiki. Other semantic wiki software is structured as extensions or plugins to standard wiki software. The best-known of these is Semantic MediaWiki, an extension to MediaWiki. Another example is the SemanticXWiki [14] extension for XWiki.

Some standard wiki engines also include the ability to add typed, semantic links to pages, including PhpWiki and Tiki Wiki CMS Groupware.

Freebase, though not billed as a wiki engine, is a web database with semantic-wiki-like properties.

Common features

Semantic wikis vary in their degree of formalization. Semantics may be either included in, or placed separately from, the wiki markup. Users may be supported when adding this content, using forms or autocompletion, or more complex proposal generation or consistency checks. The representation language may be wiki syntax, a standard language like RDF or OWL, or some database directly populated by the tool that withdraws the semantics from the raw data. Separate versioning support or correction editing for the formalized content may also be provided. Provenance support for the formalized content, that is, tagging the author of the data separately from the data itself, varies.

What data can get formalized also varies. One may be able to specify types for pages, categories, or paragraphs or sentences (the latter features were more common in pre-web systems). Links are usually also typed. The source, property, and target may be determined by some defaults, e.g. in Semantic MediaWiki the source is always the current page.

Reflexivity also varies. More reflexive user interfaces provide strong ontology support from within the wiki, and allow it to be loaded, saved, created, and changed.

Some wikis inherit their ontology entirely from a pre-existing strong ontology like Cyc or SKOS, while, on the other extreme, in other semantic wikis the entire ontology is generated by users.

Conventional, non-semantic wikis typically still have ways for users to express data and metadata, typically by tagging, categorizing, and using namespaces. In semantic wikis, these features still typically exist, but integrated these with other semantic declarations, and sometimes with their use restricted.

Some semantic wikis provide reasoning support, using a variety of engines. Such reasoning may require that all instance data comply with the ontologies.

Most semantic wikis have simple querying support (such as searching for all triples with a certain subject, predicate, object), but the degree of advanced query support varies; some semantic wikis provide querying in standard languages like SPARQL, while others instead provide a custom language. User interface support to construct these also varies. Visualization of the links especially may be supported.

Many semantic wikis can display the relationships between pages, or other data such as dates, geographical coordinates, and number values, in various formats, such as graphs, tables, charts, calendars, and maps.

See also

Related Research Articles

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats with Turtle currently being the most widely used notation.

RDF Schema is a set of classes with certain properties using the RDF extensible knowledge representation data model, providing basic elements for the description of ontologies. It uses various forms of RDF vocabularies, intended to structure RDF resources. RDF and RDFS can be saved in a triplestore, then one can extract some knowledge from them using a query language, like SPARQL.

SPARQL is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web. On 15 January 2008, SPARQL 1.0 was acknowledged by W3C as an official recommendation, and SPARQL 1.1 in March, 2013.

Semantic MediaWiki Software for creating, managing and sharing structured data in MediaWiki

Semantic MediaWiki (SMW) is an extension to MediaWiki that allows for annotating semantic data within wiki pages, thus turning a wiki that incorporates the extension into a semantic wiki. Data that has been encoded can be used in semantic searches, used for aggregation of pages, displayed in formats like maps, calendars and graphs, and exported to the outside world via formats like RDF and CSV.

Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data.

Ontotext is a software company with offices in Europe and USA. It is the semantic technology branch of Sirma Group. Its main domain of activity is the development of software based on the Semantic Web languages and standards, in particular RDF, OWL and SPARQL. Ontotext is best known for the Ontotext GraphDB semantic graph database engine. Another major business line is the development of enterprise knowledge management and analytics systems that involve big knowledge graphs. Those systems are developed on top of the Ontotext Platform that builds on top of GraphDB capabilities for text mining using big knowledge graphs.

The concept of the Social Semantic Web subsumes developments in which social interactions on the Web lead to the creation of explicit and semantically rich knowledge representations. The Social Semantic Web can be seen as a Web of collective knowledge systems, which are able to provide useful information based on human contributions and which get better as more people participate. The Social Semantic Web combines technologies, strategies and methodologies from the Semantic Web, social software and the Web 2.0.

DBpedia Online database project

DBpedia is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets.

NEPOMUK (software)

NEPOMUK is an open-source software specification that is concerned with the development of a social semantic desktop that enriches and interconnects data from different desktop applications using semantic metadata stored as RDF. Between 2006 and 2008 it was funded by a European Union research project of the same name that grouped together industrial and academic actors to develop various Semantic Desktop technologies.

A triplestore or RDF store is a purpose-built database for the storage and retrieval of triples through semantic queries. A triple is a data entity composed of subject–predicate–object, like "Bob is 35" or "Bob knows Fred".

Semantic Web Stack

The Semantic Web Stack, also known as Semantic Web Cake or Semantic Web Layer Cake, illustrates the architecture of the Semantic Web.

Freebase was a large collaborative knowledge base consisting of data composed mainly by its community members. It was an online collection of structured data harvested from many sources, including individual, user-submitted wiki contributions. Freebase aimed to create a global resource that allowed people to access common information more effectively. It was developed by the American software company Metaweb and run publicly beginning in March 2007. Metaweb was acquired by Google in a private sale announced on 16 July 2010. Google's Knowledge Graph is powered in part by Freebase.

Twine (social network)

Twine was an online, social web service for information storage, authoring and discovery, located at twine.com, that existed from 2007 to 2010. It was created and run by Radar Networks. The service was announced on October 19, 2007 and made open to the public on October 21, 2008. On March 11, 2010, Radar Networks was acquired by Evri Inc. along with Twine.com. On May 14, 2010, twine.com was shut down, becoming a redirect to evri.com.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

Sebastian Schaffert

Sebastian Schaffert is a software engineer and researcher. He was born in Trostberg, Bavaria, Germany on March 18, 1976 and obtained his doctorate in 2004.

This is a comparison of triplestores, also known as subject-predicate-object databases. Some of these database management systems have been built as database engines from scratch, while others have been built on top of existing commercial relational database engines. Like the early development of online analytical processing (OLAP) databases, this intermediate approach allowed large and powerful database engines to be constructed for little programming effort in the initial phases of triplestore development. Long-term though it seems that native triplestores will have the advantage for performance. A difficulty with implementing triplestores over SQL is that although triples may thus be stored, implementing efficient querying of a graph-based RDF model onto SQL queries is difficult.

Schema-agnostic databases or vocabulary-independent databases aim at supporting users to be abstracted from the representation of the data, supporting the automatic semantic matching between queries and databases. Schema-agnosticism is the property of a database of mapping a query issued with the user terminology and structure, automatically mapping it to the dataset vocabulary.

Blazegraph Open source triplestore and graph database

Blazegraph is an open source triplestore and graph database, developed by Systap, which is used in the Wikidata SPARQL endpoint and by other large customers. It is licensed under the GNU GPL.

Knowledge graph Type of knowledge base

In knowledge representation and reasoning, knowledge graph is a knowledge base that uses a graph-structured data model or topology to integrate data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the semantics underlying the used terminology.

References

  1. Semantic Wikis and Disaster Relief Operations, Soenke Ziesche, xml.com, December 13, 2006
  2. Semantic Wikis: A Comprehensible Introduction with Examples from the Health Sciences, Maged N. Kamel Boulos, Journal of Emerging Technologies in Web Intelligence, Vol. 1, No. 1, August 2009
  3. A semantic wiki for collaborative knowledge formation, Sebastian Schaffert, Andreas Gruber, and Rupert Westenthaler, Research Report, Knowledge-based Information Systems Group, Salzburg Research, Austria, November 23, 2005
  4. IkeWiki: A semantic wiki for collaborative knowledge management, Sebastian Schaffert, Proceedings of the 15th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE'06), June 6, 2006
  5. Comparison of Semantic MediaWiki and Wikibase
  6. Andy Dingley (21 January 2001). "Wikiwiki (was Theory: "opportunistic hypertext")". Newsgroup:  comp.infosystems.www.authoring.site-design.
  7. Leo Sauermann (2003). "The Gnowsis-Using Semantic Web Technologies to build a Semantic Desktop" (PDF). Technical University of Vienna. Retrieved 2007-06-20.
  8. Dr. Lars Ludwig (2013). "Extended Artificial Memory. Toward an integral cognitive theory of memory and technology" (pdf). Technical University of Kaiserslautern. Retrieved 2017-02-07.
  9. Call for Papers: SemWiki 2006
  10. SemWiki.org
  11. Wikia offers Semantic MediaWiki hosting, semantic-mediawiki.org, March 12, 2008
  12. Semantic Mediawiki gone from Wikia forever?
  13. Deeper understanding with Metaweb, Official Google Blog, July 16, 2010
  14. Semantic XWikiExtension, ObjectSecurity Ltd, November 16, 2012