Web annotation

Last updated

Web annotation can refer to online annotations of web resources such as web pages or parts of them, or a set of W3C standards developed for this purpose. The term can also refer to the creations of annotations on the World Wide Web and it has been used in this sense for the annotation tool INCEpTION, [1] formerly WebAnno. [2] This is a general feature of several tools for annotation in natural language processing or in the philologies.

Contents

Annotation of web resources

With a web annotation system, a user can add, modify or remove information from a Web resource without modifying the resource itself. The annotations can be thought of as a layer on top of the existing resource, and this annotation layer is usually visible to other users who share the same annotation system. In such cases, the web annotation tool is a type of social software tool. For Web-based text annotation systems, see Text annotation.

Web annotation can be used for the following purposes:

Annotations can be considered an additional layer with respect to comments. Comments are published by the same publisher who hosts the original document. Annotations are added on top of that, but may eventually become comments which, in turn, may be integrated in a further version of the document itself. [3]

Web Annotation standard

In the Web Annotation standard,

[a]n annotation is considered to be a set of connected resources, typically including a body and target, and conveys that the body is related to the target. The exact nature of this relationship changes according to the intention of the annotation, but the body is most frequently somehow "about" the target. (...) The (...) model supports additional functionality, enabling content to be embedded within the annotation, selecting arbitrary segments of resources, choosing the appropriate representation of a resource and providing styling hints to help clients render the annotation appropriately.

Robert Sanderson, Paolo Ciccarese, Benjamin Young (eds.), Web Annotation Data Model, W3C Recommendation 23 February 2017, https://www.w3.org/TR/annotation-model/

The basic data structures of Web Annotation (Fig. 1) are

Fig. 1. Basic view on the Web Annotation data model Web-anno-intro.png
Fig. 1. Basic view on the Web Annotation data model

The body can be a literal value or structured content (a URI). The target can be identified by an URI (e.g., fragment identifiers) and/or a selector that defines a domain-, resource- or application-specific access protocol, e.g., offset-based, XPath-based, etc.

Web Annotation was standardized on February 23, 2017 with the release of three official Recommendations by the W3C Web Annotation Working Group: [4] [5]

These recommendations were accompanied by additional working group notes that describe their application:

The Web Annotation data model is also provided in machine-readable form as the Web Annotation ontology. [11] Note that this ontology defines the Web Annotation namespace ( https://www.w3.org/ns/oa# ), and that this namespace is conventionally abbreviated as oa. This is the abbreviation for Open Annotation, a W3C Community Group whose specifications formed the basis for the Web Annotation standard. [12]

Web Annotation supersedes other standardization initiatives for annotations on the web within the W3C.

Web Annotation can be used in conjunction with (or as an alternative to) fragment identifiers that describe how to address elements within a web document by means of URIs. These include

Other, non-standardized fragment identifiers are in use, as well, e.g., within the NLP Interchange Format. [19]

Independently from Web Annotation, more specialized data models for representing annotations on the web have been developed, e.g., the NLP Interchange Format (NIF) [20] for applications in language technology. In early 2020, the W3C Community Group `Linked Data for Language Technology´ has launched an initiative to harmonize these vocabularies and to develop a consolidated RDF vocabulary for linguistic annotations on the web. [21]

Web Annotation Systems

Comparison of web annotation systems

Many of these systems require software to be installed to enable some or all of the features below. This fact is only noted in footnotes if the software that is required is additional software provided by a third party.

Annotation systemPrivate notesPrivate group notesPublic notesNotificationHighlightingFormatted textArchivesViewing annotationsAPIOpen Sourceimplements/ exports/ imports Web Annotation standardNotes
A.nnotate YesYesNoYes [22] YesNoYesYesNo (proprietary)UnknownCan annotate PDF, ODF, .doc, .docx, images, as well as web pages (but only a limited number in the free version)
Annotorius [23] Yes, via Google Firebase plugin [24] YesYes (BSD) [25] Yes [26] JavaScript image annotation library and plugin to OpenSeadragon, also supports simple text annotations
Diigo YesYesYesYesYesNoYesYesNo (proprietary)UnknownPublic annotations are only allowed for established[ definition needed ] users. Group tag dictionary feature to encourage tagging consistently within a group.
Hypothes.is YesYesYesYesYesYesChrome, via.hypothes.isYesYes (MIT, BSD)No, [27] but listed on Web Annotation wikiIn February 2015, different features were announced, [28] such as private group annotation, semantic tagging, moderation, etc. Web Annotation integration was discussed in 2014. [29] [30]
Org-mode (with extensions)YesNoNo [31] No [32] NoYesNoYesUnknownEmacs-based; requires technical knowledge to set up; not as user-friendly as some other solutions; non-Latin characters allowed in notes but not in tags
Recogito [33] YesYesYesYesYesYesYes (Apache)YesAnnotation of named entities, linking with maps and knowledge graphs, developed by the Pelagios network, [34] popular in Digital Humanities

Discontinued web annotation systems

SystemNotesDate Discontinued
Mosaic BrowserAn early version of the Mosaic browser was tested with collaborative annotation feature in 1993 [35] but never passed the test state.Never passed the test state
CritLinkPerhaps the earliest web annotation system. Developed in 1997–98 by Ka-Ping Yee of the University of California. [36] CritLink worked as an HTML "mediator", hence not requiring additional software or browser extensions but having limited support for modern JavaScript-driven websites.
Annotea A W3C project that tried to establish a standard for web annotation. Annotea was conceived as part of the semantic web. According to the website, Annotea development stalled in October 2005. [37]
ThirdVoice A system launched in 1999. It was targeted by a campaign called Say No to TV, [38] led by independent web hosts which likened ThirdVoice to "Web graffiti." [39] It was shut in April 2001 because it couldn't generate enough advertising revenue to stay in business. [40] April 2001
Delicious Founded in 2003 and provided cloud bookmarks with optional descriptions (a form of annotation) of up to 1000 characters. It was rumoured that it would be shut down in 2010, but it was only actually finally shut down in 2017 when it was acquired by Pinboard, a competitor.2017
WikalongA Firefox plugin created in 2004 that provided a publicly editable mediawiki page in the margin of any webpage. (It was later accessible in other browsers via a bookmarklet.) Common uses were note-taking and discussion about the website. On Google, the Wikalong margin provided a variety of useful tips and shortcuts for searching. The project was discontinued in 2009 when the storage wiki went offline. It had been suffering from link spam abuse. [41] [42] 2009
Fleck* Launched in 2005 with much publicity as a stick-it notes application for the web. A patent, funding and marketing didn't stop it from failing. Discontinued in 2010. [43] 2010
stet Stet was the Gplv3 comment system. [44]
CrocodocLaunched in 2007, dabbled in web page annotation as part of its broader mission. It was originally developed in Adobe Flash. It was acquired by Box.com in 2013 [45] and the web annotation side of it was shut down two years later. [46] 2009
Blerp Launched in 2009. A multimedia, extensible tool for annotating web pages with widgets viewable by any other Blerp user.
Google Sidewiki Launched in 2009. A part of Google Toolbar that allowed users to write comments alongside any web page. It was discontinued in December 2011.December 2011
SharedCopy An AJAX based web annotation tool that allowed users to mark-up, highlight, draw, annotate, cache, sticky-note and finally share any website.
Genius Web Annotator Formerly knows as News Genius, launched in 2016. [47] As of May 2022, the Chrome and Wordpress extensions are discontinued, and the bookmarklet provided on the official website does not work. As of August 2022, it is not possible to log in and leave new annotations.

See also

Related Research Articles

<span class="mw-page-title-main">HTTP</span> Application protocol for distributed, collaborative, hypermedia information systems

The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, where hypertext documents include hyperlinks to other resources that the user can easily access, for example by a mouse click or by tapping the screen in a web browser.

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

A Uniform Resource Identifier (URI) is a unique sequence of characters that identifies a logical or physical resource used by web technologies. URIs may be used to identify anything, including real-world objects, such as people and places, concepts, or information resources such as web pages and books. Some URIs provide a means of locating and retrieving information resources on a network ; these are Uniform Resource Locators (URLs). A URL provides the location of the resource. A URI identifies the resource by name at the specified location or URL. Other URIs provide only a unique name, without a means of locating or retrieving the resource or information about it; these are Uniform Resource Names (URNs). The web technologies that use URIs are not limited to web browsers. URIs are used to identify anything described using the Resource Description Framework (RDF), for example, concepts that are part of an ontology defined using the Web Ontology Language (OWL), and people who are described using the Friend of a Friend vocabulary would each have an individual URI.

The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats, with Turtle currently being the most widely used notation.

XPointer is a system for addressing components of XML-based Internet media. It is divided among four specifications: a "framework" that forms the basis for identifying XML fragments, a positional element addressing scheme, a scheme for namespaces, and a scheme for XPath-based addressing. XPointer Framework is a W3C recommendation since March 2003.

A Uniform Resource Name (URN) is a Uniform Resource Identifier (URI) that uses the urn scheme. URNs are globally unique persistent identifiers assigned within defined namespaces so they will be available for a long period of time, even after the resource which they identify ceases to exist or becomes unavailable. URNs cannot be used to directly locate an item and need not be resolvable, as they are simply templates that another parser may use to find an item.

A persistent uniform resource locator (PURL) is a uniform resource locator (URL) that is used to redirect to the location of the requested web resource. PURLs redirect HTTP clients using HTTP status codes.

In computer hypertext, a URI fragment is a string of characters that refers to a resource that is subordinate to another, primary resource. The primary resource is identified by a Uniform Resource Identifier (URI), and the fragment identifier points to the subordinate resource.

XML namespaces are used for providing uniquely named elements and attributes in an XML document. They are defined in a W3C recommendation. An XML instance may contain element or attribute names from more than one XML vocabulary. If each vocabulary is given a namespace, the ambiguity between identically named elements or attributes can be resolved.

RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. The Resource Description Framework (RDF) data-model mapping enables its use for embedding RDF subject-predicate-object expressions within XHTML documents. It also enables the extraction of RDF model triples by compliant user agents.

Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data.

httpRange-14 is a long-running logical conundrum or design problem in the semantic web. The problem arises because when HTTP is extended from referring only to documents to talking about real-world things the domain of HTTP GET becomes undefined.

A Uniform Resource Locator (URL), colloquially known as an address on the Web, is a reference to a resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifier (URI), although many people use the two terms interchangeably. URLs occur most commonly to reference web pages (HTTP/HTTPS) but are also used for file transfer (FTP), email (mailto), database access (JDBC), and many other applications.

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data Cloud was conceived and is being maintained by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation, but has been a point of focal activity for several W3C community groups, research projects, and infrastructure efforts since then.

<span class="mw-page-title-main">Well-known URI</span>

A well-known URI is a Uniform Resource Identifier for URL path prefixes that start with /.well-known/. They are implemented in webservers so that requests to the servers for well-known services or information are available at URLs consistent well-known locations across servers.

Linked Data Notifications (LDN) is a W3C Recommendation that describes a communications protocol based on HTTP, URI, and RDF on how servers (receivers) can receive messages pushed to them by applications (senders), as well as how other applications (consumers) may retrieve those messages. Any web resource can advertise a receiving endpoint (inbox) for notification messages. Messages are expressed in RDF, and can contain arbitrary data.

<span class="mw-page-title-main">Thing Description</span>

The Thing Description (TD) (or W3C WoT Thing Description (TD)) is a royalty-free, open information model with a JSON based representation format for the Internet of Things (IoT). A TD provides a unified way to describe the capabilities of an IoT device or service with its offered data model and functions, protocol usage, and further metadata. Using Thing Descriptions help reduce the complexity of integrating IoT devices and their capabilities into IoT applications.

<span class="mw-page-title-main">Verifiable credentials</span>

Verifiable credentials (VCs) are digital credentials which follow the relevant World Wide Web Consortium open standards. They can represent information found in physical credentials, such as a passport or license, as well as new things that have no physical equivalent, such as ownership of a bank account. They have numerous advantages over physical credentials, most notably that they're digitally signed, which makes them tamper-resistant and instantaneously verifiable. The security of verifiable credentials in the context of COVID-19 vaccination and test certificates has been questioned. Verifiable credentials have also been subject to usability concerns. Verifiable credentials can be issued by anyone, about anything, and can be presented to and verified by everyone. The entity that generates the credential is called the Issuer. The credential is then given to the Holder who stores it for later use. The Holder can then prove something about themselves by presenting their credentials to a Verifier.

<span class="mw-page-title-main">Decentralized identifier</span> Verifiable digital identity technology

Decentralized identifiers (DIDs) are a type of globally unique identifier that enables an entity to be identified in a manner that is verifiable, persistent, and does not require the use of a centralized registry. DIDs enable a new model of decentralized digital identity that is often referred to as self-sovereign identity or decentralized identity. They are an important component of decentralized web applications.

In linguistics and language technology, a language resource is a "[composition] of linguistic material used in the construction, improvement and/or evaluation of language processing applications, (...) in language and language-mediated research studies and applications."

References

  1. Inception (2022-07-30). "Welcome". Inception-project.github.io. Retrieved 2022-08-07.
  2. "Welcome".
  3. Schepers, Doug. "Web Annotation Architecture". W3C . Retrieved 29 July 2016.
  4. "Deliverables of W3C's Web Annotation Working Group".
  5. Whaley, Dan (February 24, 2017). "Annotation Is Now a Web Standard". Hypothes.is .
  6. "Web Annotation Data Model".
  7. "Web Annotation Vocabulary".
  8. "Web Annotation Protocol".
  9. "Embedding Web Annotations in HTML".
  10. "Selectors and States".
  11. "Web Annotation Vocabulary".
  12. "Open Annotation Community Group".
  13. Wilde, E.; Duerst, M. (2008). "Rfc5147". doi:10.17487/RFC5147.{{cite journal}}: Cite journal requires |journal= (help)
  14. Hausenblas, Michael; Wilde, Erik; Tennison, Jeni (January 2014). "Rfc7111". doi:10.17487/RFC7111.{{cite journal}}: Cite journal requires |journal= (help)
  15. Hardy, Matthew; Masinter, Larry M.; Markovic, Dejan; Johnson, Duff; Bailey, Martin (March 2017). "The application/PDF Media Type". doi:10.17487/RFC8118.{{cite journal}}: Cite journal requires |journal= (help)
  16. "Linking – SVG 1.1 (Second Edition)".
  17. "XPointer Framework".
  18. "Media Fragments URI 1.0 (Basic)".
  19. "Guidelines for Linked Data corpus creation using NIF".
  20. "Nlp2rdf.org". Site.nlp2rdf.org. Retrieved 2022-08-07.
  21. "Ld4lt/Linguistic-annotation". GitHub . 31 May 2021.
  22. See A.nnotate notifications
  23. "Annotorious | JavaScript image annotation library". annotorious.github.io. Retrieved 2023-10-18.
  24. "Getting Started: Storing Annotations". recogito.github.io. Retrieved 2023-10-18.
  25. "Annotorious | About". annotorious.github.io. Retrieved 2023-10-18.
  26. "Annotorious & Web Annotation". recogito.github.io. Retrieved 2023-10-18.
  27. "Hypothesis API documentation (v1)". h.readthedocs.io. Retrieved 2023-10-18.
  28. "Trello". hypothes.is.
  29. bigbluehat (2014-10-15). "Open Annotation RDFa using Hypothes.is JSON". Hypothesis. Retrieved 2023-10-18.
  30. "Implementations - Web Annotation Wiki". www.w3.org. Retrieved 2023-10-18.
  31. Technically, public annotations are possible via the "publish to HTML" feature of org mode -- but no method for notifications or discovery of public annotations written by others is currently known.
  32. But local annotations can be exposed to a firefox browser using Fireforg.
  33. "Semantic Annotation without the pointy brackets". recogito.pelagios.org. Retrieved 2023-10-18.
  34. "Welcome to Pelagios Network". pelagios.org. Retrieved 2023-10-18.
  35. Andreessen, Marc (1993-05-31). "group annotation server guinea pigs?". webhistory.org. Retrieved 2017-11-08.
  36. Yee, Ka-Ping (2002). "CritLink: Advanced Hyperlinks Enable Public Annotation on the Web". CiteSeerX: 10.1.1.5.5050 .{{cite web}}: Missing or empty |url= (help)
  37. "Annotea project".
  38. "Say NO To Third Voice! - WEBMASTERS UNITE!!". Archived from the original on April 5, 2001.
  39. "The Web's New Graffiti?". Archived from the original on November 21, 2001.
  40. Third Voice Trails Off, Wired News, April 4, 2001
  41. Wikalong Firefox Addon, Oct 1, 2006
  42. "SEO – SEO-Optimierung". www.wikalong.org.
  43. Farewell Fleck.com, "The Next Web", May 10, 2010
  44. "Comment system". Free Software Foundation. 11 May 2011. Retrieved 28 July 2016.
  45. "Box acquires Crocodoc to turn all those docs you upload into HTML5 masterpieces". VentureBeat. 9 May 2013. Retrieved 12 June 2016.
  46. "Box is Shutting Down Crocodoc Personal and Webnotes on November 1". VentureBeat. 5 August 2015. Retrieved 12 June 2016.
  47. "News Genius". Genius. Retrieved 2022-05-27.

Further reading

  1. Web Annotation Data Model
  2. Web Annotation Vocabulary
  3. Web Annotation Protocol