Microdata (HTML)

Last updated

Microdata is a WHATWG HTML specification used to nest metadata within existing content on web pages. [1] Search engines, web crawlers, and browsers can extract and process Microdata from a web page and use it to provide a richer browsing experience for users. Search engines benefit greatly from direct access to Microdata because it allows them to understand the information on web pages and provide more relevant results to users. [2] [3] Microdata uses a supporting vocabulary to describe an item and name-value pairs to assign values to its properties. [4] Microdata is an attempt to provide a simpler way of annotating HTML elements with machine-readable tags than the similar approaches of using RDFa and microformats.

Contents

In 2013, because the W3C HTML Working Group failed to find someone to serve as an editor for the Microdata HTML specification, its development was terminated with a 'Note'. [5] [6] However, since that time, two new editors were selected, and five newer versions of the working draft have been published, [7] [8] [9] [10] the most recent being Working Draft 26 April 2018. [10]

Vocabularies

Microdata vocabularies do not provide the semantics, or meaning of an Item. [11] Web developers can design a custom vocabulary or use vocabularies available on the web. A collection of commonly used markup vocabularies are provided by Schema.org schemas which include: Person, "Place", Event, Organization, Product, Review, Review-aggregate, Breadcrumb, Offer, Offer-aggregate. The website schema.org was established by search engine operators like Google, Microsoft, Yahoo!, and Yandex, which use microdata markup to improve search results. [12] :85

For some purposes, an ad-hoc vocabulary is adequate. For others, a vocabulary will need to be designed. Where possible, authors are encouraged to re-use existing vocabularies, as this makes content re-use easier. [1]

Localization

In some cases, search engines covering specific regions may provide locally-specific extensions of microdata. For example, Yandex, a major search engine in Russia, supports microformats such as hCard (company contact information), hRecipe (food recipe), hReview (market reviews) and hProduct (product data) and provides its own format for definition of the terms and encyclopedic articles. This extension was made in order to solve transliteration problems between the Cyrillic and Latin alphabets. After the implementation of additional parameters from Schema's vocabulary, [13] indexation of information in Russian-language web-pages became more successful.

Global attributes

Example

The following HTML5 markup may be found on a typical “About” page containing information about a person:

<section> Hello, my name is John Doe, I am a graduate research assistant at the University of Dreams. My friends call me Johnny.  You can visit my homepage at <ahref="http://www.example.com/~JohnnyD">www.example.com/~JohnnyD</a>. I live at 1234 Peach Drive, Warner Robins, Georgia.</section>

Here is the same markup with added Schema.org [14] [15] [16] Microdata:

<sectionitemscopeitemtype="http://schema.org/Person">   Hello, my name is   <spanitemprop="name">John Doe</span>,   I am a   <spanitemprop="jobTitle">graduate research assistant</span>   at the   <spanitemprop="affiliation">University of Dreams</span>.   My friends call me   <spanitemprop="additionalName">Johnny</span>.   You can visit my homepage at   <ahref="http://www.example.com/~JohnnyD"itemprop="url">www.example.com/~JohnnyD</a>.   <sectionitemprop="address"itemscopeitemtype="http://schema.org/PostalAddress">   I live at    <spanitemprop="streetAddress">1234 Peach Drive</span>,   <spanitemprop="addressLocality">Warner Robins</span>,   <spanitemprop="addressRegion">Georgia</span>.  </section></section>

As the above example shows, Microdata items can be nested. In this case, an item of type http://schema.org/PostalAddress is nested inside an item of type http://schema.org/Person.

The following text shows how Google parses the Microdata from the above example code. Developers can test pages containing Microdata using Google's Rich Snippet Testing Tool. [17]

The same machine-readable terms can be used not only in HTML Microdata, but also in other annotations such as RDFa or JSON-LD in the markup, or in an external RDF file in a serialization such as RDF/XML, Notation3, or Turtle.

Support

See also

Related Research Articles

<span class="mw-page-title-main">HTML</span> HyperText Markup Language

HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

An HTML element is a type of HTML document component, one of several types of HTML nodes. The first used version of HTML was written by Tim Berners-Lee in 1993 and there have since been many versions of HTML. The current de facto standard is governed by the industry group WHATWG and is known as the HTML Living Standard.

Microformats (μF) are a set of defined HTML classes created to serve as consistent and descriptive metadata about an element, designating it as representing a certain type of data. They allow software to process the information reliably by having set classes refer to a specific type of data rather than being arbitrary. Microformats emerged around 2005 and were predominantly designed for use by search engines, web syndication and aggregators such as RSS.

hCalendar is a microformat standard for displaying a semantic (X)HTML representation of iCalendar-format calendar information about an event, on web pages, using HTML classes and rel attributes.

hCard is a microformat for publishing the contact details of people, companies, organizations, and places, in HTML, Atom, RSS, or arbitrary XML. The hCard microformat does this using a 1:1 representation of vCard properties and values, identified using HTML classes and rel attributes.

In HTML, <div> and <span> tags are elements used to define parts of a document, so that they are identifiable when a unique classification is necessary. Where other HTML elements such as <p> (paragraph), <em> (emphasis), and so on, accurately represent the semantics of the content, the additional use of <span> and <div> tags leads to better accessibility for readers and easier maintainability for authors. Where no existing HTML element is applicable, <span> and <div> can valuably represent parts of a document so that HTML attributes such as class, id, lang, or dir can be applied.

GRDDL is a markup format for Gleaning Resource Descriptions from Dialects of Languages. It is a W3C Recommendation, and enables users to obtain RDF triples out of XML documents, including XHTML. The GRDDL specification shows examples using XSLT, however it was intended to be abstract enough to allow for other implementations as well. It became a Recommendation on September 11, 2007.

RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. The Resource Description Framework (RDF) data-model mapping enables its use for embedding RDF subject-predicate-object expressions within XHTML documents. It also enables the extraction of RDF model triples by compliant user agents.

<span class="mw-page-title-main">HTML5</span> Fifth and previous version of hypertext markup language

HTML5 is a markup language used for structuring and presenting hypertext documents on the World Wide Web. It was the fifth and final major HTML version that is now a retired World Wide Web Consortium (W3C) recommendation. The current specification is known as the HTML Living Standard. It is maintained by the Web Hypertext Application Technology Working Group (WHATWG), a consortium of the major browser vendors.

Geo is a microformat used for marking up geographical coordinates in HTML. Coordinates are expected in angular units of degrees and geodetic datum WGS84. Although termed a "draft" specification, the format is a de facto standard, stable and in widespread use; not least as a sub-set of the published hCalendar and hCard microformat specifications, neither of which is still a draft.

Embedded RDF (eRDF) is a syntax for writing HTML in such a way that the information in the HTML document can be extracted into Resource Description Framework (RDF). This can be of great use for searching within data.

<span class="mw-page-title-main">Semantic HTML</span> HTML used to reinforce meaning of documents or webpages

Semantic HTML is the use of HTML markup to reinforce the semantics, or meaning, of the information in web pages and web applications rather than merely to define its presentation or look. Semantic HTML is processed by traditional web browsers as well as by many other user agents. CSS is used to suggest its presentation to human users.

The Web Hypertext Application Technology Working Group (WHATWG) is a community of people interested in evolving HTML and related technologies. The WHATWG was founded by individuals from Apple Inc., the Mozilla Foundation and Opera Software, leading Web browser vendors in 2004.

Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages which mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.

A link relation is a descriptive attribute attached to a hyperlink in order to define the type of the link, or the relationship between the source and destination resources. The attribute can be used by automated systems, or can be presented to a user in a different way.

XHTML+RDFa is an extended version of the XHTML markup language for supporting RDF through a collection of attributes and processing rules in the form of well-formed XML documents. XHTML+RDFa is one of the techniques used to develop Semantic Web content by embedding rich semantic markup. Version 1.1 of the language is a superset of XHTML 1.1, integrating the attributes according to RDFa Core 1.1. In other words, it is an RDFa support through XHTML Modularization.

The Web platform is a collection of technologies developed as open standards by the World Wide Web Consortium and other standardization bodies such as the Web Hypertext Application Technology Working Group, the Unicode Consortium, the Internet Engineering Task Force, and Ecma International. It is the umbrella term introduced by the World Wide Web Consortium, and in 2011 it was defined as "a platform for innovation, consolidation and cost efficiencies" by W3C CEO Jeff Jaffe. Being built on The evergreen Web has allowed for the addition of new capabilities while addressing security and privacy risks. Additionally, developers are enabled to build interoperable content on a cohesive platform.

Schema.org is a reference website that publishes documentation and guidelines for using structured data mark-up on web-pages. Its main objective is to standardize HTML tags to be used by webmasters for creating rich results about a certain topic of interest. It is a part of the semantic web project, which aims to make document mark-up codes more readable and meaningful to both humans and machines.

A document type declaration, or DOCTYPE, is an instruction that associates a particular XML or SGML document with a document type definition (DTD). In the serialized form of the document, it manifests as a short string of markup that conforms to a particular syntax.

References

  1. 1 2 3 4 5 "Microdata — HTML Draft Standard". Whatwg.org. Retrieved 2016-06-30.
  2. "MicroData - The Future of Search Engine Relevance and Optimization (SEO)". Lyquix.com. Retrieved 2016-06-30.
  3. Schema.org http://schema.org/
  4. ""Distributed," "Extensibility," And Other Fancy Words". Diveintohtml5.info. Retrieved 2016-06-30.
  5. Cotton, Paul (2 Oct 2013). "WG Decision to publish HTML Microdata as a WG Note". public-html-admin@w3.org (Mailing list). Retrieved 2016-06-30.
  6. "HTML Microdata". W3.org. 23 June 2014. Retrieved 2016-06-30.
  7. "HTML Microdata W3C First Public Working Draft 04 May 2017". World Wide Web Consortium (W3C). Retrieved 2017-09-06.
  8. "HTML Microdata W3C Working Draft 26 June 2017". World Wide Web Consortium (W3C). Retrieved 2017-09-06.
  9. "HTML Microdata W3C Working Draft 09 October 2017". World Wide Web Consortium (W3C). 9 October 2017. Retrieved 16 March 2018.
  10. 1 2 "HTML Microdata W3C Working Draft 10 October 2017". World Wide Web Consortium (W3C). 10 October 2017. Retrieved 16 March 2018.
  11. "HTML Standard". Web Hypertext Application Technology Working Group. Retrieved 30 December 2016.
  12. MacDonald, Matthew (2014). HTML5: The missing manual (2nd ed.). O'Reilly and Associates. ISBN   978-1-4493-6326-0.
  13. "Semantic markup deployment in Russia". Academia.edu. Retrieved 2016-06-30.
  14. "Documentation". Schema.org. Retrieved 2016-06-30.
  15. "Type Hierarchy". Schema.org. Retrieved 2016-06-30.
  16. "Schema.org Turtle RDFS Schema". Archived from the original on 2014-09-21. Retrieved 2013-05-29.
  17. 1 2 "Rich snippets (microdata, microformats, RDFa)". Google Inc. 2016-05-17. Retrieved 2016-06-30.
  18. "Rich Snippet display clarification". 2016-06-22. Retrieved 2016-06-30.
  19. Google Webmasters Channel (2011-12-06). Types of Rich Snippets (Video). Archived from the original on 2021-12-15. Retrieved 2016-06-30.{{cite AV media}}: |author= has generic name (help)
  20. "Microdata DOM API - Web APIs | MDN". developer.mozilla.org. Retrieved 2021-07-05.
  21. Opera Software Documentation Team (2011-12-06). "Opera 11.60 for Windows changelog". Opera.com. Archived from the original on 2014-10-23. Retrieved 2016-06-30.
  22. "909633 - Remove HTML Microdata API". bugzilla.mozilla.org. Retrieved 2021-07-05.