Deep linking

Last updated

In the context of the World Wide Web, deep linking is the use of a hyperlink that links to a specific, generally searchable or indexed, piece of web content on a website (e.g. "https://example.com/path/page"), rather than the website's home page (e.g., "https://example.com"). The URL contains all the information needed to point to a particular item. Deep linking is different from mobile deep linking, which refers to directly linking to in-app content using a non-HTTP URI.

Contents

Deep linking and HTTP

The technology behind the World Wide Web, the Hypertext Transfer Protocol (HTTP), does not actually make any distinction between "deep" links and any other linksall links are functionally equal. This is intentional; one of the design purposes of the Web is to allow authors to link to any published document on another site. The possibility of so-called "deep" linking is therefore built into the Web technology of HTTP and URLs by defaultwhile a site can attempt to restrict deep links, to do so requires extra effort. According to the World Wide Web Consortium Technical Architecture Group, "any attempt to forbid the practice of deep linking is based on a misunderstanding of the technology, and threatens to undermine the functioning of the Web as a whole". [1]

Usage

Some commercial websites object to other sites making deep links into their content either because it bypasses advertising on their main pages, passes off their content as that of the linker or, like The Wall Street Journal , they charge users for permanently valid links. Sometimes, deep linking has led to legal action such as in the 1997 case of Ticketmaster versus Microsoft, where Microsoft deep-linked to Ticketmaster's site from its Sidewalk service. This case was settled when Microsoft and Ticketmaster arranged a licensing agreement. Ticketmaster later filed a similar case against Tickets.com, and the judge in this case ruled that such linking was legal as long as it was clear to whom the linked pages belonged. [2] The court also concluded that URLs themselves were not copyrightable, writing: "A URL is simply an address, open to the public, like the street address of a building, which, if known, can enable the user to reach the building. There is nothing sufficiently original to make the URL a copyrightable item, especially the way it is used. There appear to be no cases holding the URLs to be subject to copyright. On principle, they should not be."

Deep linking and web technologies

Websites built on technologies such as Adobe Flash and AJAX often do not support deep linking. This can cause usability problems for visitors to those sites. For example, they may be unable to save bookmarks to individual pages or states of the site, use the web browser forward and back buttons—and clicking the browser refresh button may return the user to the initial page.

However, this is not a fundamental limitation of these technologies. Well-known techniques, and libraries such as SWFAddress [3] and unFocus History Keeper, [4] now exist that website creators using Flash or AJAX can use to provide deep linking to pages within their sites. [5] [6] [7]

Court rulings

Probably the earliest legal case arising out of deep linking was the 1996 Scottish case of The Shetland Times vs. The Shetland News , in which the Times accused the News of appropriating stories on the Times' website as its own. [8] [9]

At the beginning of 2006, in a case between the search engine Bixee.com and job site Naukri.com, the Delhi High Court in India prohibited Bixee.com from deep linking to Naukri.com. [10]

The most important and widely cited U.S. opinions on deep linking are the Ninth Circuit's rulings in Kelly v. Arriba Soft Corp. [11] and Perfect 10, Inc. v. Amazon.com, Inc. . [12] In both cases, the court exonerated the use of deep linking. In the second of these cases, the court explained (speaking of defendant Google, whom Perfect 10 had also sued) why linking is not a copyright infringement under US law:

Google does not…display a copy of full-size infringing photographic images for purposes of the Copyright Act when Google frames in-line linked images that appear on a user's computer screen. Because Google's computers do not store the photographic images, Google does not have a copy of the images for purposes of the Copyright Act. In other words, Google does not have any "material objects…in which a work is fixed…and from which the work can be perceived, reproduced, or otherwise communicated" and thus cannot communicate a copy. Instead of communicating a copy of the image, Google provides HTML instructions that direct a user's browser to a website publisher's computer that stores the full-size photographic image. Providing these HTML instructions is not equivalent to showing a copy. First, the HTML instructions are lines of text, not a photographic image. Second, HTML instructions do not themselves cause infringing images to appear on the user's computer screen. The HTML merely gives the address of the image to the user's browser. The browser then interacts with the computer that stores the infringing image. It is this interaction that causes an infringing image to appear on the user's computer screen. Google may facilitate the user's access to infringing images. However, such assistance raised only contributory liability issues and does not constitute direct infringement of the copyright owner's display rights. …While in-line linking and framing may cause some computer users to believe they are viewing a single Google webpage, the Copyright Act, unlike the Trademark Act, does not protect a copyright holder against acts that cause consumer confusion.

In December 2006, a Texas court ruled that linking by a motocross website to videos on a Texas-based motocross video production website did not constitute fair use. The court subsequently issued an injunction. [13] This case, SFX Motor Sports Inc., v. Davis, was not published in official reports, but is available at 2006 WL 3616983.

In a February 2006 ruling, the Danish Maritime and Commercial Court (Copenhagen) found systematic crawling, indexing and deep linking by portal site ofir.dk of real estate site Home.dk not to conflict with Danish law or the database directive of the European Union. The Court stated that search engines are desirable for the functioning of the Internet, and that, when publishing information on the Internet, one must assume—and accept—that search engines deep-link to individual pages of one's website. [14]

Legend

Web site owners who do not want search engines to deep link, or want them only to index specific pages can request so using the Robots Exclusion Standard (robots.txt file). People who favor deep linking often feel that content owners who do not provide a robots.txt file are implying by default that they do not object to deep linking either by search engines or others.[ citation needed ] People against deep linking often claim that content owners may be unaware of the Robots Exclusion Standard or may not use robots.txt for other reasons.[ citation needed ] Sites other than search engines can also deep link to content on other sites, so some question the relevance of the Robots Exclusion Standard to controversies about Deep Linking. [15] The Robots Exclusion Standard does not programmatically enforce its directives so it does not prevent search engines and others who do not follow polite conventions from deep linking. [16]

See also

Related Research Articles

<span class="mw-page-title-main">Google Search</span> Search engine from Google

Google Search is a search engine operated by Google. It allows users to search for information on the Internet by entering keywords or phrases. Google Search uses algorithms to analyze and rank websites based on their relevance to the search query. It is the most popular search engine worldwide.

Meta elements are tags used in HTML and XHTML documents to provide structured metadata about a Web page. They are part of a web page's head section. Multiple Meta elements with different attributes can be used on the same page. Meta elements can be used to specify page description, keywords and any other metadata not provided through the other head elements and attributes.

<span class="mw-page-title-main">Hyperlink</span> Method of referencing visual computer data

In computing, a hyperlink, or simply a link, is a digital reference to data that the user can follow or be guided to by clicking or tapping. A hyperlink points to a whole document or to a specific element within a document. Hypertext is text with hyperlinks. The text that is linked from is known as anchor text. A software system that is used for viewing and creating hypertext is a hypertext system, and to create a hyperlink is to hyperlink. A user following hyperlinks is said to navigate or browse the hypertext.

robots.txt Internet protocol

robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.

Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic rather than direct traffic or paid traffic. Unpaid traffic may originate from different kinds of searches, including image search, video search, academic search, news search, and industry-specific vertical search engines.

<span class="mw-page-title-main">Googlebot</span> Web crawler used by Google

Googlebot is the web crawler software used by Google that collects documents from the web to build a searchable index for the Google Search engine. This name is actually used to refer to two different types of web crawlers: a desktop crawler and a mobile crawler.

Inline linking is the use of a linked object, often an image, on one site by a web page belonging to a second site. One site is said to have an inline link to the other site where the object is located.

In computer networks, download means to receive data from a remote system, typically a server such as a web server, an FTP server, an email server, or other similar systems. This contrasts with uploading, where data is sent to a remote server.

Cloaking is a search engine optimization (SEO) technique in which the content presented to the search engine spider is different from that presented to the user's browser. This is done by delivering content based on the IP addresses or the User-Agent HTTP header of the user requesting the page. When a user is identified as a search engine spider, a server-side script delivers a different version of the web page, one that contains content not present on the visible page, or that is present but not searchable. The purpose of cloaking is sometimes to deceive search engines so they display the page when it would not otherwise be displayed. However, it can also be a functional technique for informing search engines of content they would not otherwise be able to locate because it is embedded in non-textual containers, such as video or certain Adobe Flash components. Since 2006, better methods of accessibility, including progressive enhancement, have been available, so cloaking is no longer necessary for regular SEO.

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.

Sitemaps is a protocol in XML format meant for a webmaster to inform search engines about URLs on a website that are available for web crawling. It allows webmasters to include additional information about each URL: when it was last updated, how often it changes, and how important it is in relation to other URLs of the site. This allows search engines to crawl the site more efficiently and to find URLs that may be isolated from the rest of the site's content. The Sitemaps protocol is a URL inclusion protocol and complements robots.txt, a URL exclusion protocol.

<span class="mw-page-title-main">Search engine</span> Software system for finding relevant information on the Web

A search engine is a software system that provides hyperlinks to web pages and other relevant information on the Web in response to a user's query. The user inputs a query within a web browser or a mobile app, and the search results are often a list of hyperlinks, accompanied by textual summaries and images. Users also have the option of limiting the search to a specific type of results, such as images, videos, or news.

<i>Intellectual Reserve, Inc. v. Utah Lighthouse Ministry, Inc.</i> Legal case

Intellectual Reserve, Inc. v. Utah Lighthouse Ministry, Inc., 75 F. Supp. 2d 1290, was a United States district court decision on the subject of deep linking and contributory infringement of copyright.

<i>Kelly v. Arriba Soft Corp.</i>

Kelly v. Arriba Soft Corporation, 280 F.3d 934 withdrawn, re-filed at 336 F.3d 811, is a U.S. court case between a commercial photographer and a search engine company. During the case, ownership of Arriba Soft changed to Sorceron, the operator of the Internet search engine Ditto.com. The court found that US search engines may use thumbnails of images, though the issue of inline linking to full size images instead of going to the original site was not resolved.

<i>Perfect 10, Inc. v. Amazon.com, Inc.</i> 2007 American legal decision

Perfect 10, Inc. v. Amazon.com, Inc., 508 F.3d 1146 was a case in the United States Court of Appeals for the Ninth Circuit involving a copyright infringement claim against Amazon.com, Inc. and Google, Inc., by the magazine publisher Perfect 10, Inc. The court held that framing and hyperlinking of original images for use in an image search engine constituted a fair use of Perfect 10's images because the use was highly transformative, and thus not an infringement of the magazine's copyright ownership of the original images.

Field v. Google, Inc., 412 F.Supp. 2d 1106 is a case where Google Inc. successfully defended a lawsuit for copyright infringement. Field argued that Google infringed his exclusive right to reproduce his copyrighted works when it "cached" his website and made a copy of it available on its search engine. Google raised multiple defenses: fair use, implied license, estoppel, and Digital Millennium Copyright Act safe harbor protection. The court granted Google's motion for summary judgment and denied Field's motion for summary judgment.

In copyright law, the legal status of hyperlinking and that of framing concern how courts address two different but related Web technologies. In large part, the legal issues concern use of these technologies to create or facilitate public access to proprietary media content — such as portions of commercial websites. When hyperlinking and framing have the effect of distributing, and creating routes for the distribution of content (information) that does not come from the proprietors of the Web pages affected by these practices, the proprietors often seek the aid of courts to suppress the conduct, particularly when the effect of the conduct is to disrupt or circumvent the proprietors' mechanisms for receiving financial compensation.

<span class="mw-page-title-main">Wayback Machine</span> Digital archive by the Internet Archive

The Wayback Machine is a digital archive of the World Wide Web founded by the Internet Archive, an American nonprofit organization based in San Francisco, California. Created in 1996 and launched to the public in 2001, it allows the user to go "back in time" to see how websites looked in the past. Its founders, Brewster Kahle and Bruce Gilliat, developed the Wayback Machine to provide "universal access to all knowledge" by preserving archived copies of defunct web pages.

Ticketmaster Corp., et al. v. Tickets.Com, Inc. was a 2000 case by the United States District Court for the Central District of California finding that deep linking did not violate the Copyright Act of 1976 because it did not involve direct copying. The decision permitted Tickets.com to place deep links to Ticketmaster.

<span class="mw-page-title-main">Search engine cache</span>

Search engine cache is a cache of web pages that shows the page as it was when it was indexed by a web crawler. Cached versions of web pages can be used to view the contents of a page when the live version cannot be reached, has been altered or taken down.

References

  1. Bray, Tim (Sep 11, 2003). "Deep Linking in the World Wide Web". W3.org. Retrieved May 30, 2007.
  2. Finley, Michelle (Mar 30, 2000). "Attention Editors: Deep Link Away". Wired News.
  3. "a swfaddress example: how to deep link your flash tutorial » SQUIBL Blog". Squibl.com. 2010-10-14. Archived from the original on 2014-05-25. Retrieved 2014-06-25.
  4. "History Keeper – Deep Linking in Flash & JavaScript". Unfocus.com. 10 April 2007.
  5. "Deep-linking to frames in Flash websites". Adobe.com.
  6. "Deep Linking for Flash and Ajax". Asual.com.
  7. "Deep Linking for AJAX". Blog.onthewings.net.
  8. "Shetland Internet squabble settled out of court". BBC. 11 November 1997.
  9. For a more extended discussion, see generally the Wikipedia article Copyright aspects of hyperlinking and framing.
  10. "High Court Critical On Deeplinking". EFYtimes.com. Dec 29, 2005. Archived from the original on 2007-09-27. Retrieved May 30, 2007.
  11. 336 F.3d 811 (9th Cir. 2003).
  12. 487 F.3d 701 (9th Cir. 2007).
  13. Declan McCullagh. "Judge: Can't link to Webcast if copyright owner objects". News.com. Retrieved May 30, 2007.
  14. "Udskrift af SØ- & Handelsrettens Dombog" (PDF) (in Danish). Bvhd.dk. February 24, 2006. Archived from the original (PDF) on October 12, 2007. Retrieved May 30, 2007.
  15. "Robots.txt meant for search engines don't work well for web archives | Internet Archive Blogs" . Retrieved 2019-05-20.
  16. "Deep Linking Basics: Explaining Key Concepts". AppsFlyer. Retrieved 2019-05-20.