Schema.org

Last updated
Schema.org
Schema.org-Logo.svg
Abbreviationschema
Year started2011;13 years ago (2011)
Latest version15.0 (2022-10-25) [1]
Organization Google, Yahoo!, Microsoft, Yandex
Base standards URI, HTML5, RDF, Microdata, ISO 8601
Related standards RDFa, Microformat, RDFS, OWL, N-Triples, Turtle, JSON, JSON-LD, CSV
Domain Semantic Web
License CC-BY-SA 3.0
Website schema.org

Schema.org is a reference website that publishes documentation and guidelines for using structured data mark-up on web-pages (called microdata). Its main objective is to standardize HTML tags to be used by webmasters for creating rich results (displayed as visual data or infographic tables on search engine results) about a certain topic of interest. [2] It is a part of the semantic web project, which aims to make document mark-up codes more readable and meaningful to both humans and machines.

Contents

History

Schema.org is an initiative launched on June 2, 2011, by Bing, Google and Yahoo! [3] [4] [5] (operators of the world's largest search engines at that time) [6] to create and support a common set of schemas for structured data markup on web pages. In November 2011, Yandex (whose search engine is the largest in Russia) joined the initiative. [7] [8] They propose using the schema.org vocabulary along with the Microdata, RDFa, or JSON-LD formats [9] to mark up website content with metadata about itself. Such markup can be recognized by search engine spiders and other parsers, thus granting access to the meaning of the sites (see Semantic Web). The initiative also describes an extension mechanism for adding additional properties. [10] In 2012, the GoodRelations ontology was integrated into Schema.org. [11] Public discussion of the initiative largely takes place on the W3C public vocabularies mailing list. [12]

Much of the vocabulary on Schema.org was inspired by earlier formats, such as microformats, FOAF, and OpenCyc. [13] Microformats, with its most dominant representative hCard, continue (as of 2015) to be published widely on the web, where the deployment of Schema.org has strongly increased between 2012 and 2014. [14] In 2015, [15] Google began supporting the JSON-LD format, and as of September, 2017 recommended using JSON-LD for structured data whenever possible. [16] [17]

Despite the advantages of using Schema.org, adoption remained limited as of 2016. A survey in 2016 of 300 US-based marketing agencies and B2C advertisers across industries showing only 17% uptake. [18]

Such validators as the soon-to-be-deprecated Google Structured Data Testing Tool, [19] or more recent [20] Google Rich Results Test Tool, [21] Yandex Microformat validator, [22] and Bing Markup Validator [23] can be used to test the validity of the data marked up with the schemas and Microdata. More recently, Google Search Console (formerly webmaster tools) has provided a report section for unparsable structured data. If any Schema code on a website is incorrect, it will show in this report. [24] Some schema markups such as Organization and Person are commonly used to influence search results returned by Google's Knowledge Graph. [25]

Schema Types

There are a number of items that a web page can be marked up with using a Schema, with examples including:

Examples

Microdata

The following is an example [26] of how to mark up information about a movie and its director using the Schema.org schemas and microdata. In order to mark up the data, the attribute itemtype along with the URL of the schema is used. The attribute itemscope defines the scope of the itemtype. The kind of the current item can be defined by using the attribute itemprop.

<divitemscopeitemtype="http://schema.org/Movie"><h1itemprop="name">Avatar</h1><divitemprop="director"itemscopeitemtype="http://schema.org/Person">   Director: <spanitemprop="name">James Cameron</span>  (born <timeitemprop="birthDate"datetime="1954-08-16">August 16, 1954</time>)   </div><spanitemprop="genre">Science fiction</span><ahref="../movies/avatar-theatrical-trailer.html"itemprop="trailer">Trailer</a></div>

RDFa 1.1 Lite

<divvocab="http://schema.org/"typeof="Movie"><h1property="name">Avatar</h1><divproperty="director"typeof="Person">   Director: <spanproperty="name">James Cameron</span> (born <timeproperty="birthDate"datetime="1954-08-16">August 16, 1954</time>)   </div><spanproperty="genre">Science fiction</span><ahref="../movies/avatar-theatrical-trailer.html"property="trailer">Trailer</a></div>

JSON-LD

<scripttype="application/ld+json">{"@context":"http://schema.org/","@type":"Movie","name":"Avatar","director":{"@type":"Person","name":"James Cameron","birthDate":"1954-08-16"},"genre":"Science fiction","trailer":"../movies/avatar-theatrical-trailer.html"}</script>

Related Research Articles

<span class="mw-page-title-main">HTML</span> HyperText Markup Language

HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats, with Turtle currently being the most widely used notation.

XSD, a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item content in a document, to assure it adheres to the description of the element it is placed in.

The noindex value of an HTML robots meta tag requests that automated Internet bots avoid indexing a web page. Reasons why one might want to use this meta tag include advising robots not to index a very large database, web pages that are very transitory, web pages that are under development, web pages that one wishes to keep slightly more private, or the printer and mobile-friendly versions of pages. Since the burden of honoring a website's noindex tag lies with the author of the search robot, sometimes these tags are ignored. Also the interpretation of the noindex tag is sometimes slightly different from one search engine company to the next.

Microformats (μF) are a set of defined HTML classes created to serve as consistent and descriptive metadata about an element, designating it as representing a certain type of data. They allow software to process the information reliably by having set classes refer to a specific type of data rather than being arbitrary. Microformats emerged around 2005 and were predominantly designed for use by search engines, web syndication and aggregators such as RSS.

Sitemaps is a protocol in XML format meant for a webmaster to inform search engines about URLs on a website that are available for web crawling. It allows webmasters to include additional information about each URL: when it was last updated, how often it changes, and how important it is in relation to other URLs of the site. This allows search engines to crawl the site more efficiently and to find URLs that may be isolated from the rest of the site's content. The Sitemaps protocol is a URL inclusion protocol and complements robots.txt, a URL exclusion protocol.

hCard is a microformat for publishing the contact details of people, companies, organizations, and places, in HTML, Atom, RSS, or arbitrary XML. The hCard microformat does this using a 1:1 representation of vCard properties and values, identified using HTML classes and rel attributes.

nofollow is a setting on a web page hyperlink that directs search engines not to use the link for page ranking calculations. It is specified in the page as a type of link relation; that is: <a rel="nofollow" ...>. Because search engines often calculate a site's importance according to the number of hyperlinks from other sites, the nofollow setting allows website authors to indicate that the presence of a link is not an endorsement of the target site's importance.

RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. The Resource Description Framework (RDF) data-model mapping enables its use for embedding RDF subject-predicate-object expressions within XHTML documents. It also enables the extraction of RDF model triples by compliant user agents.

Geo is a microformat used for marking up geographical coordinates in HTML. Coordinates are expected in angular units of degrees and geodetic datum WGS84. Although termed a "draft" specification, the format is a de facto standard, stable and in widespread use; not least as a sub-set of the published hCalendar and hCard microformat specifications, neither of which is still a draft.

Embedded RDF (eRDF) is a syntax for writing HTML in such a way that the information in the HTML document can be extracted into Resource Description Framework (RDF). This can be of great use for searching within data.

<span class="mw-page-title-main">Semantic HTML</span> HTML used to reinforce meaning of documents or webpages

Semantic HTML is the use of HTML markup to reinforce the semantics, or meaning, of the information in web pages and web applications rather than merely to define its presentation or look. Semantic HTML is processed by traditional web browsers as well as by many other user agents. CSS is used to suggest its presentation to human users.

<span class="mw-page-title-main">Bing Webmaster Tools</span>

Bing Webmaster Tools is a free service as part of Microsoft's Bing search engine which allows webmasters to add their websites to the Bing index crawler, see their site's performance in Bing and a lot more. The service also offers tools for webmasters to troubleshoot the crawling and indexing of their website, submission of new URLs, Sitemap creation, submission and ping tools, website statistics, consolidation of content submission, and new content and community resources.

Microdata is a WHATWG HTML specification used to nest metadata within existing content on web pages. Search engines, web crawlers, and browsers can extract and process Microdata from a web page and use it to provide a richer browsing experience for users. Search engines benefit greatly from direct access to Microdata because it allows them to understand the information on web pages and provide more relevant results to users. Microdata uses a supporting vocabulary to describe an item and name-value pairs to assign values to its properties. Microdata is an attempt to provide a simpler way of annotating HTML elements with machine-readable tags than the similar approaches of using RDFa and microformats.

XHTML+RDFa is an extended version of the XHTML markup language for supporting RDF through a collection of attributes and processing rules in the form of well-formed XML documents. XHTML+RDFa is one of the techniques used to develop Semantic Web content by embedding rich semantic markup. Version 1.1 of the language is a superset of XHTML 1.1, integrating the attributes according to RDFa Core 1.1. In other words, it is an RDFa support through XHTML Modularization.

JSON-LD is a method of encoding linked data using JSON. One goal for JSON-LD was to require as little effort as possible from developers to transform their existing JSON to JSON-LD. JSON-LD allows data to be serialized in a way that is similar to traditional JSON. It was initially developed by the JSON for Linking Data Community Group before being transferred to the RDF Working Group for review, improvement, and standardization, and is currently maintained by the JSON-LD Working Group. JSON-LD is a World Wide Web Consortium Recommendation.

Linked Data Notifications (LDN) is a W3C Recommendation that describes a communications protocol based on HTTP, URI, and RDF on how servers (receivers) can receive messages pushed to them by applications (senders), as well as how other applications (consumers) may retrieve those messages. Any web resource can advertise a receiving endpoint (inbox) for notification messages. Messages are expressed in RDF, and can contain arbitrary data.

<span class="mw-page-title-main">Thing Description</span>

The Thing Description (TD) (or W3C WoT Thing Description (TD)) is a royalty-free, open information model with a JSON based representation format for the Internet of Things (IoT). A TD provides a unified way to describe the capabilities of an IoT device or service with its offered data model and functions, protocol usage, and further metadata. Using Thing Descriptions help reduce the complexity of integrating IoT devices and their capabilities into IoT applications.

References

  1. "schema.org releases".
  2. "About schema.org initiative". W3C. Retrieved 28 June 2018.
  3. "Home - schema.org". schema.org. Retrieved 2019-04-01.
  4. Introducing schema.org: Search engines come together for a richer web, Google blog, 2 June 2011
  5. Introducing Schema.org: Bing, Google and Yahoo Unite to Build the Web of Objects, Bing blog, 2 June 2011
  6. "Top 5 Search Engines from Oct to Dec 10". StatCounter. Retrieved 17 January 2011.
  7. nate451. "Yandex joins Google, Yahoo! and Bing to collaborate on Schema.org - TechCrunch" . Retrieved 6 July 2017.{{cite web}}: CS1 maint: numeric names: authors list (link)
  8. "Yandex now supports schema.org markup". blog.schema.org. Retrieved 6 July 2017.
  9. "Getting Started - schema.org". schema.org. Retrieved 6 July 2017.
  10. "Extending Schemas". schema.org. 2011-06-02. Retrieved 2 June 2011.
  11. "Good Relations and Schema.org". blog.schema.org. Retrieved 6 July 2017.
  12. "W3C web vocabularies mailing list". w3.org. 2013-07-22. Retrieved 22 July 2013.
  13. "FAQ". schema.org. Retrieved 2 June 2011.
  14. "Web Data Commons – RDFa, Microdata, and Microformat Data Sets -- Extracting Structured Data from the Common Web Crawl". 3.1. Extraction Results from the December 2014 Common Crawl Corpus. 2015-04-13. Retrieved 2015-04-13.
  15. "Easier website development with Web Components and JSON-LD". 2015-03-09.
  16. "Introduction to Structured Data". 2017-09-13.
  17. "How to add Schema Markup Data JSON-LD". YouTube . 2019-09-06.
  18. "Prioritize Search To Maximize ROI Of Marketing" (PDF). 2017-01-01.
  19. "Structured Data Testing Tool". www.google.com. Retrieved 25 August 2020.
  20. "The Rich Results Test is out of beta". webmasters.googleblog.com. Retrieved 25 August 2020.
  21. "Rich Result Tool". www.google.com. Retrieved 28 July 2020.
  22. "Микроразметка — Яндекс.Вебмастер". webmaster.yandex.ru. Retrieved 6 July 2017.
  23. "Bing - Markup Validator". www.bing.com. Retrieved 6 July 2017.
  24. "What is Schema Mark Up and How Can it Benefit Your Business". 2019-12-04. Archived from the original on 2021-08-18. Retrieved 2021-08-18.
  25. "Specify your social profiles to Google". Google Developers. Retrieved 2015-06-25.
  26. "Getting Started - schema.org". schema.org. Retrieved 6 July 2017.