Semantic publishing

Last updated

Semantic publishing on the Web, or semantic web publishing, refers to publishing information on the web as documents accompanied by semantic markup. Semantic publication provides a way for computers to understand the structure and even the meaning of the published information, making information search and data integration more efficient. [1] [2] [3] [4] [5] [6] [7]

Contents

Although semantic publishing is not specific to the Web, it has been driven by the rising of the semantic web. In the semantic web, published information is accompanied by metadata describing the information, providing a "semantic" context. [8] [9] [10]

Although semantic publishing has the potential to change the face of web publishing, acceptance depends on the emergence of compelling applications. Web sites can already be built with all contents in both HTML format and semantic format. [11] RSS1.0, uses RDF (a semantic web standard) format, although it has become less popular than RSS2.0 and Atom. [12]

Semantic publishing has the potential to revolutionize scientific publishing. Tim Berners-Lee predicted in 2001 that the semantic web "will likely profoundly change the very nature of how scientific knowledge is produced and shared, in ways that we can now barely imagine". [13] Revisiting the semantic web in 2006, he and his colleagues believed the semantic web "could bring about a revolution in how, for example, scientific content is managed throughout its life cycle". [8] Researchers could directly self-publish their experiment data in "semantic" format on the web. Semantic search engines could then make these data widely available. The W3C interest group in healthcare and life sciences is exploring this idea. [14]

Two approaches

Examples

Examples of ontologies and vocabularies for publishing        Examples "semantic content" containers for publishing

Examples of free or open source tools and services

See also

Related Research Articles

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats, with Turtle currently being the most widely used notation.

Notation3, or N3 as it is more commonly known, is a shorthand non-XML serialization of Resource Description Framework models, designed with human-readability in mind: N3 is much more compact and readable than XML RDF notation. The format is being developed by Tim Berners-Lee and others from the Semantic Web community. A formalization of the logic underlying N3 was published by Berners-Lee and others in 2008.

<i>Biochemical Journal</i> Academic journal

The Biochemical Journal is a peer-reviewed scientific journal which covers all aspects of biochemistry, as well as cell and molecular biology. It is published by Portland Press and was established in 1906.

RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. The Resource Description Framework (RDF) data-model mapping enables its use for embedding RDF subject-predicate-object expressions within XHTML documents. It also enables the extraction of RDF model triples by compliant user agents.

<span class="mw-page-title-main">UTOPIA (bioinformatics tools)</span>

UTOPIA is a suite of free tools for visualising and analysing bioinformatics data. Based on an ontology-driven data model, it contains applications for viewing and aligning protein sequences, rendering complex molecular structures in 3D, and for finding and using resources such as web services and data objects. There are two major components, the protein analysis suite and UTOPIA documents.

Embedded RDF (eRDF) is a syntax for writing HTML in such a way that the information in the HTML document can be extracted into Resource Description Framework (RDF). This can be of great use for searching within data.

<span class="mw-page-title-main">Linked data</span> Structured data and method for its publication

In computing, linked data is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for the Internet to become a global database.

<span class="mw-page-title-main">DBpedia</span> Online database project

DBpedia is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets.

<span class="mw-page-title-main">Semantic HTML</span> HTML used to reinforce meaning of documents or webpages

Semantic HTML is the use of HTML markup to reinforce the semantics, or meaning, of the information in web pages and web applications rather than merely to define its presentation or look. Semantic HTML is processed by traditional web browsers as well as by many other user agents. CSS is used to suggest its presentation to human users.

Microdata is a WHATWG HTML specification used to nest metadata within existing content on web pages. Search engines, web crawlers, and browsers can extract and process Microdata from a web page and use it to provide a richer browsing experience for users. Search engines benefit greatly from direct access to this structured data because it allows them to understand the information on web pages and provide more relevant results to users. Microdata uses a supporting vocabulary to describe an item and name-value pairs to assign values to its properties. Microdata is an attempt to provide a simpler way of annotating HTML elements with machine-readable tags than the similar approaches of using RDFa and microformats.

The web content lifecycle is the multi-disciplinary and often complex process that web content undergoes as it is managed through various publishing stages.

XHTML+RDFa is an extended version of the XHTML markup language for supporting RDF through a collection of attributes and processing rules in the form of well-formed XML documents. XHTML+RDFa is one of the techniques used to develop Semantic Web content by embedding rich semantic markup. Version 1.1 of the language is a superset of XHTML 1.1, integrating the attributes according to RDFa Core 1.1. In other words, it is an RDFa support through XHTML Modularization.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

JSON-LD is a method of encoding linked data using JSON. One goal for JSON-LD was to require as little effort as possible from developers to transform their existing JSON to JSON-LD. JSON-LD allows data to be serialized in a way that is similar to traditional JSON. It was initially developed by the JSON for Linking Data Community Group before being transferred to the RDF Working Group for review, improvement, and standardization, and is currently maintained by the JSON-LD Working Group. JSON-LD is a World Wide Web Consortium Recommendation.

Utopia Documents is a semantic, scientific, web-enabled PDF reader that is part of the Utopia toolset. Utopia Documents can be downloaded for free.

<span class="mw-page-title-main">Terri Attwood</span> British bioinformatics researcher

Teresa K. Attwood is a professor of Bioinformatics in the Department of Computer Science and School of Biological Sciences at the University of Manchester and a visiting fellow at the European Bioinformatics Institute (EMBL-EBI). She held a Royal Society University Research Fellowship at University College London (UCL) from 1993 to 1999 and at the University of Manchester from 1999 to 2002.

The European Legislation Identifier (ELI) ontology is a vocabulary for representing metadata about national and European Union (EU) legislation. It is designed to provide a standardized way to identify and describe the context and content of national or EU legislation, including its purpose, scope, relationships with other legislations and legal basis. This will guarantee easier identification, access, exchange and reuse of legislation for public authorities, professional users, academics and citizens. ELI paves the way for knowledge graphs, based on semantic web standards, of legal gazettes and official journals.

Enhanced publications or enhanced ebooks are a form of electronic publishing for the dissemination and sharing of research outcomes, whose first formal definition can be tracked back to 2009. As many forms of digital publications, they typically feature a unique identifier and descriptive metadata information. Unlike traditional digital publications, enhanced publications are often tailored to serve specific scientific domains and are generally constituted by a set of interconnected parts corresponding to research assets of several kinds and to textual descriptions of the research. The nature and format of such parts and of the relationships between them, depends on the application domain and may largely vary from case to case.

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data Cloud was conceived and is being maintained by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation, but has been a point of focal activity for several W3C community groups, research projects, and infrastructure efforts since then.

References

  1. Attwood, T. K.; Kell, D. B.; McDermott, P.; Marsh, J.; Pettifer, S. R.; Thorne, D. (2009). "Calling International Rescue: Knowledge lost in literature and data landslide!". Biochemical Journal. 424 (3): 317–333. doi:10.1042/BJ20091474. PMC   2805925 . PMID   19929850.
  2. Batchelor, C.R., and Corbett, P.T. (2007) Semantic enrichment of journal articles using chemical named entity recognition. Proceedings of the ACL 2007 Demo and Poster Sessions, pages 45–48, Prague, June 2007.
  3. Pettifer, S.; McDermott, P.; Marsh, J.; Thorne, D.; Villeger, A.; Attwood, T. K. (2011). "Ceci n'est pas un hamburger: Modelling and representing the scholarly article". Learned Publishing. 24 (3): 207. doi: 10.1087/20110309 .
  4. Shotton, D. (2009). "Semantic publishing: The coming revolution in scientific journal publishing". Learned Publishing. 22 (2): 85–94. doi: 10.1087/2009202 .
  5. Shotton, D.; Portwin, K.; Klyne, G.; Miles, A. (2009). Bourne, Philip E (ed.). "Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article". PLOS Computational Biology. 5 (4). e1000361. Bibcode:2009PLSCB...5E0361S. doi: 10.1371/journal.pcbi.1000361 . PMC   2663789 . PMID   19381256.
  6. Shadbolt, Nigel; Berners-Lee, Tim; Hall, Wendy (May–June 2006). "The Semantic Web Revisited" (PDF). IEEE Intelligent Systems. 21 (3): 96–101. doi:10.1109/MIS.2006.62. S2CID   7719423.
  7. Berners-Lee, T.; Hendler, J. (2001). "Publishing on the semantic web". Nature. 410 (6832): 1023–1024. doi:10.1038/35074206. PMID   11323639. S2CID   32243333.
  8. 1 2 Shadbolt, Berners-Lee & Hall 2006.
  9. Stefan Gradmann: From Catalogs to Graphs: Changing Terms for a Changing Profession
  10. Hull, D.; Pettifer, S.; Kell, D. (Oct 2008). McEntyre, Johanna (ed.). "Defrosting the digital library: bibliographic tools for the next generation web". PLOS Computational Biology . 4 (10). e1000204. Bibcode:2008PLSCB...4E0204H. doi: 10.1371/journal.pcbi.1000204 . ISSN   1553-734X. PMC   2568856 . PMID   18974831.
  11. Examples are:
      mindswap [ verification needed ]
      UMBC ebiquity
      "Why publishes[sic] raw experiment data?". web2express.org. December 5, 2006. Archived from the original on 2007-01-06.
  12. 1 2 Web2express.org applies RDF to various data feeds. Anyone can use their service: "Unified Data Feed". web2express.org. Archived from the original on 2007-10-11, to create and provide RDF data resources and datafeeds for products, news, events, jobs and studies.
  13. Berners-Lee & Hendler 2001
  14. "HCLS/ScientificPublishingTaskForce". W2C. "About Demo". Archived from the original on 2007-01-04.
  15. "SweoIG/TaskForces/CommunityProjects/LinkingOpenData". W2C.
  16. list of data sources
  17. Semantic Publishing Tools
  18. Attwood, T. K.; Kell, D. B.; McDermott, P.; Marsh, J.; Pettifer, S. R.; Thorne, D. (2010). "Utopia documents: Linking scholarly literature with research data". Bioinformatics. 26 (18): i568–i574. doi:10.1093/bioinformatics/btq383. PMC   2935404 . PMID   20823323.

Further reading