Canonical link element

Last updated

A canonical link element is an HTML element that helps webmasters prevent duplicate content issues in search engine optimization by specifying the "canonical" or "preferred" version of a web page. It is described in RFC 6596, which went live in April 2012. [1] [2]

Contents

Purpose

A major problem for search engines is to determine the original source for documents that are available on multiple URLs. Content duplication can happen in many ways, including: [3]

Duplicate content issues occur when the same content is accessible from multiple URLs. [4] For example, http://www.example.com/page.html would be considered by search engines to be an entirely different page from http://www.example.com/page.html?parameter=1, even though both URLs may reference the same content. [5] [6]

In February 2009, Google, Yahoo and Microsoft announced support for the canonical link element, which can be inserted into the <head> section of a web page, to allow webmasters to prevent these issues. [7] The canonical link element helps webmasters make clear to the search engines which page should be credited as the original.

How search engines handle rel=canonical

Search engines try to utilize canonical link definitions as an output filter for their search results. If multiple URLs contain the same content in the result set, the canonical link URL definitions will likely be incorporated to determine the original source of the content. "For example, when Google finds identical content instances, it decides to show one of them. Its choice of the resource to display in the search results will depend upon the search query." [8]

According to Google, the canonical link element is not considered to be a directive, but rather a hint that the ranking algorithm will "honor strongly." [1] [9]

While the canonical link element has its benefits, Matt Cutts, then the head of Google's webspam team, has said that the search engine prefers the use of 301 redirects. Cutts said the preference for redirects is because Google's spiders can choose to ignore a canonical link element if they deem it more beneficial to do so. [10] [11]

Implementation

Semantic tag

The canonical link element can be either used in the semantic HTML <head> or sent with the HTTP header of a document. For non HTML documents, the HTTP header is an alternate way to set a canonical URL. [3] [12]

By the HTML 5 standard, the <linkrel="canonical"href="http://example.com/"> HTML element must be within the <head> section of the document. [13]

Some sites such as Stack Overflow [14] have on-page hyperlinks which link to a clean URL of themselves. Usability benefits are facilitating copying the hyperlink target URL or title if the browser or a browser extension offers a "Copy link text" context menu option for hyperlinks, the ability for the original URL to be retrieved from a saved page if not stored by the browser into a comment inside the file, as well as the ability to duplicate the opened page into a new tab right next to the currently opened one if the browser lacks such a feature. [15] [14]

Examples

HTML

Below is an example of HTML code that uses the rel=canonical inside the <head> tag. The code could be used on a page, such as https://example.com/page.php?parameter=1to tell search engines that the https://example.com/page.php is the preferred version of the webpage.

<!DOCTYPE html><html><head><linkrel="canonical"href="https://www.example.com/page.php"></head><body> ... </body></html>

HTTP

HTTP/1.1200OKContent-Type:application/pdfLink:<https://www.newthink.life/page.php>; rel="canonical"Content-Length:4223...

See also

Related Research Articles

Meta elements are tags used in HTML and XHTML documents to provide structured metadata about a Web page. They are part of a web page's head section. Multiple Meta elements with different attributes can be used on the same page. Meta elements can be used to specify page description, keywords and any other metadata not provided through the other head elements and attributes.

<span class="mw-page-title-main">Hyperlink</span> Method of referencing visual computer data

In computing, a hyperlink, or simply a link, is a digital reference to data that the user can follow or be guided to by clicking or tapping. A hyperlink points to a whole document or to a specific element within a document. Hypertext is text with hyperlinks. The text that is linked from is known as anchor text. A software system that is used for viewing and creating hypertext is a hypertext system, and to create a hyperlink is to hyperlink. A user following hyperlinks is said to navigate or browse the hypertext.

<span class="mw-page-title-main">Bookmarklet</span> Web browser bookmark containing JavaScript code

A bookmarklet is a bookmark stored in a web browser that contains JavaScript commands that add new features to the browser. They are stored as the URL of a bookmark in a web browser or as a hyperlink on a web page. Bookmarklets are usually small snippets of JavaScript executed when user clicks on them. When clicked, bookmarklets can perform a wide variety of operations, such as running a search query from selected text or extracting data from a table.

An HTML element is a type of HTML document component, one of several types of HTML nodes. The first used version of HTML was written by Tim Berners-Lee in 1993 and there have since been many versions of HTML. The current de facto standard is governed by the industry group WHATWG and is known as the HTML Living Standard.

URL redirection, also called URL forwarding, is a World Wide Web technique for making a web page available under more than one URL address. When a web browser attempts to open a URL that has been redirected, a page with a different URL is opened. Similarly, domain redirection or domain forwarding is when all pages in a URL domain are redirected to a different domain, as when wikipedia.com and wikipedia.net are automatically redirected to wikipedia.org.

In computer science, canonicalization is a process for converting data that has more than one possible representation into a "standard", "normal", or canonical form. This can be done to compare different representations for equivalence, to count the number of distinct data structures, to improve the efficiency of various algorithms by eliminating repeated calculations, or to make it possible to impose a meaningful sorting order.

The anchor text, link label or link text is the visible, clickable text in an HTML hyperlink. The term "anchor" was used in older versions of the HTML specification for what is currently referred to as the a element, or <a>. The HTML specification does not have a specific term for anchor text, but refers to it as "text that the a element wraps around". In XML terms, the anchor text is the content of the element, provided that the content is text.

In the context of a web browser, a frame is a part of a web page or browser window which displays content independent of its container, with the ability to load content independently. The HTML or media elements in a frame may come from a web site distinct from the site providing the enclosing content. This practice, known as framing, is today often regarded as a violation of same-origin policy.

Link prefetching allows web browsers to pre-load resources. This speeds up both the loading and rendering of web pages. Prefetching was first introduced in HTML5.

Microformats (μF) are a set of defined HTML classes created to serve as consistent and descriptive metadata about an element, designating it as representing a certain type of data. They allow software to process the information reliably by having set classes refer to a specific type of data rather than being arbitrary.

hCalendar is a microformat standard for displaying a semantic (X)HTML representation of iCalendar-format calendar information about an event, on web pages, using HTML classes and rel attributes.

Sitemaps is a protocol in XML format meant for a webmaster to inform search engines about URLs on a website that are available for web crawling. It allows webmasters to include additional information about each URL: when it was last updated, how often it changes, and how important it is in relation to other URLs of the site. This allows search engines to crawl the site more efficiently and to find URLs that may be isolated from the rest of the site's content. The Sitemaps protocol is a URL inclusion protocol and complements robots.txt, a URL exclusion protocol.

hCard is a microformat for publishing the contact details of people, companies, organizations, and places, in HTML, Atom, RSS, or arbitrary XML. The hCard microformat does this using a 1:1 representation of vCard properties and values, identified using HTML classes and rel attributes.

nofollow is a setting on a web page hyperlink that directs search engines not to use the link for page ranking calculations. It is specified in the page as a type of link relation; that is: <a rel="nofollow" ...>. Because search engines often calculate a site's importance according to the number of hyperlinks from other sites, the nofollow setting allows website authors to indicate that the presence of a link is not an endorsement of the target site's importance.

RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. The Resource Description Framework (RDF) data-model mapping enables its use for embedding RDF subject-predicate-object expressions within XHTML documents. It also enables the extraction of RDF model triples by compliant user agents.

<span class="mw-page-title-main">HTTP 301</span> HTTP response status code

On the World Wide Web, HTTP 301 is the HTTP response status code for 301 Moved Permanently. It is used for permanent redirecting, meaning that links or records returning this response should be updated. The new URL should be provided in the Location field, included with the response. The 301 redirect is considered a best practice for upgrading users from HTTP to HTTPS.

Duplicate content is a term used in the field of search engine optimization to describe content that appears on more than one web page. The duplicate content can be substantial parts of the content within or across domains and can be either exactly duplicate or closely similar. When multiple pages contain essentially the same content, search engines such as Google and Bing can penalize or cease displaying the copying site in any relevant search results.

Microdata is a WHATWG HTML specification used to nest metadata within existing content on web pages. Search engines, web crawlers, and browsers can extract and process Microdata from a web page and use it to provide a richer browsing experience for users. Search engines benefit greatly from direct access to Microdata because it allows them to understand the information on web pages and provide more relevant results to users. Microdata uses a supporting vocabulary to describe an item and name-value pairs to assign values to its properties. Microdata is an attempt to provide a simpler way of annotating HTML elements with machine-readable tags than the similar approaches of using RDFa and microformats.

XHTML+RDFa is an extended version of the XHTML markup language for supporting RDF through a collection of attributes and processing rules in the form of well-formed XML documents. XHTML+RDFa is one of the techniques used to develop Semantic Web content by embedding rich semantic markup. Version 1.1 of the language is a superset of XHTML 1.1, integrating the attributes according to RDFa Core 1.1. In other words, it is an RDFa support through XHTML Modularization.

The rel="alternate" hreflang="x" link attribute is a HTML meta element described in RFC 8288. Hreflang specifies the language and optional geographic restrictions for a document. Hreflang is interpreted by search engines and can be used by webmasters to clarify the lingual and geographical targeting of a website.

References

  1. 1 2 Kupke, Joachim (2009-02-12). "Specify your canonical". Google. Retrieved 2012-08-02.
  2. Cutts, Matt (2009-02-15). "Learn about the Canonical Link Element in 5 minutes" . Retrieved 2012-08-02.
  3. 1 2 "Link rel=canonical: How to do URL canonicalization right". Audisto GmbH. Retrieved 2015-10-06.
  4. "Duplicate content". Google. Retrieved 2012-08-02.
  5. Biswas, Kushal. "Canonical Issue and How to Use Canonical Tag – The Proper Way". RevenueI. Archived from the original on 14 June 2016. Retrieved 18 June 2015.
  6. Zadro, Dario (19 February 2015). "Rel=Canonical - A Beginners Guide to Canonical Tags - Where and When to Use Them". Zadro Web. Retrieved 18 June 2015.
  7. Fox, Vanessa (2009-02-12). "Google, Yahoo & Microsoft Unite On "Canonical Tag" To Reduce Duplicate Content Clutter". Search Engine Land . Retrieved 2012-08-02.
  8. How Google And Other Search Engines Manage Canonical Links | http://seomediax.com/seo/how-google-and-other-search-engines-manage-canonical-links/
  9. "Consolidate duplicate URLs - Search Console Help". support.google.com.
  10. Cutts, Matt (2011-05-16). "A rel [equals] canonical corner case" . Retrieved 2012-08-02.
  11. קידום אתרים אורגני
  12. "Consolidate duplicate URLs - Search Console Help". support.google.com.
  13. "HTML link tag". www.w3schools.com. Retrieved 2019-01-07.
  14. "3 FireFox Addons to Easier Copy Links and Anchor Texts". Search Engine Journal. 7 January 2011.