HTTP 404

Last updated

In computer network communications, the HTTP 404, 404 not found, 404, 404 error, page not found, or file not found error message is a hypertext transfer protocol (HTTP) standard response code, to indicate that the browser was able to communicate with a given server, but the server could not find what was requested. The error may also be used when a server does not wish to disclose whether it has the requested information. [1]

Contents

The website hosting server will typically generate a "404 Not Found" web page when a user attempts to follow a broken or dead link; hence the 404 error is one of the most recognizable errors encountered on the World Wide Web.

English Wikipedia's 404 page Wikipedia 404 Page.png
English Wikipedia's 404 page

Overview

When communicating via HTTP, a server is required to respond to a request, such as a web browser request for a web page, with a numeric response code and an optional, mandatory, or disallowed (based upon the status code) message. In code 404, the first digit indicates a client error, such as a mistyped Uniform Resource Locator (URL). The following two digits indicate the specific error encountered. HTTP's use of three-digit codes is similar to the use of such codes in earlier protocols such as FTP and NNTP. At the HTTP level, a 404 response code is followed by a human-readable "reason phrase". The HTTP specification suggests the phrase "Not Found" [1] and many web servers by default issue an HTML page that includes both the 404 code and the "Not Found" phrase.

A 404 error is often returned when pages have been moved or deleted. In the first case, it is better to employ URL mapping or URL redirection by returning a 301 Moved Permanently response, which can be configured in most server configuration files, or through URL rewriting; in the second case, a 410 Gone should be returned. Because these two options require special server configuration, most websites do not make use of them.

404 errors should not be confused with DNS errors, which appear when the given URL refers to a server name that does not exist. A 404 error indicates that the server itself was found, but that the server was not able to retrieve the requested page.

History

The term "404 Not Found" was coined by Berners-Lee himself, who explained in a 1998 interview that he wanted to make the error message "slightly apologetic". [2] He also said that he considered using "400 Bad Request" instead, but decided that it was too vague and technical. [2]

The first documented case of a 404 error appearing on a web page was in 1993, when a user tried to access a page about the Mosaic web browser on the NCSA website. The page had been moved to a different location, but the link had not been updated. [3] The user reported the error to the NCSA team, who fixed the link and added a humorous message to their 404 page: "We're sorry, but the document you requested is not here. Maybe you should try someplace else." [2]

Since then, 404 errors have become one of the most common and recognizable errors on the Web. Many websites have customized their 404 pages with creative designs, messages, or features to entertain or assist their visitors. For example, Google's 404 page features a broken robot and a link to its homepage, [4] while GitHub's 404 page shows a random image of a parallax star field and a link to its status page. [5] Some websites have also used their 404 pages to showcase their brand personality, humor, or social causes. For instance, Lego's 404 page shows the Lego minifigure Emmet along with a humorous message, [6] Amazon's displays the image of a dog, [7] Peugeot's shows a picture of its 404 model, [8] and RTÉ's shows an image of Bosco. [9] [10]

Soft 404 errors

Some websites report a "not found" error by returning a standard web page with a "200 OK" response code, falsely reporting that the page loaded properly; this is known as a soft 404. The term "soft 404" was introduced in 2004 by Ziv Bar-Yossef et al. [11]

Soft 404s are problematic for automated methods of discovering whether a link is broken. Some search engines, like Yahoo and Google, use automated processes to detect soft 404s. [12] Soft 404s can occur as a result of configuration errors when using certain HTTP server software, for example with the Apache software, when an Error Document 404 (specified in a .htaccess file) is specified as an absolute path (e.g. http://example.com/error.html) rather than a relative path (/error.html). [13] This can also be done on purpose to force some browsers (like Internet Explorer) to display a customized 404 error message rather than replacing what is served with a browser-specific "friendly" error message (in Internet Explorer, this behavior is triggered when a 404 is served and the received HTML is shorter than a certain length, and can be manually disabled by the user).

There are also "soft 3XX" errors where content is returned with a status 200 but comes from a redirected page, such as when missing pages are redirected to the domain root/home page.

Proxy servers

Some proxy servers generate a 404 error when a 500-range error code would be more correct. If the proxy server is unable to satisfy a request for a page because of a problem with the remote host (such as hostname resolution failures or refused TCP connections), this should be described as a 5xx Internal Server Error, but might deliver a 404 instead. This can confuse programs that expect and act on specific responses, as they can no longer easily distinguish between an absent web server and a missing web page on a web server that is present.

Intentional 404s

In July 2004, the UK telecom provider BT Group deployed the Cleanfeed content blocking system, which returns a 404 error to any request for content identified as potentially illegal by the Internet Watch Foundation. [14] Other ISPs return a HTTP 403 "forbidden" error in the same circumstances. [15] The practice of employing fake 404 errors as a means to conceal censorship has also been reported in Thailand [16] and Tunisia. [17] In Tunisia, where censorship was severe before the 2011 revolution, people became aware of the nature of the fake 404 errors and created an imaginary character named "Ammar 404" who represents "the invisible censor". [18]

Microsoft Internet Server 404 substatus error codes

The webserver software developed by Microsoft, Microsoft's Internet Information Services (IIS), returns a set of substatus codes with its 404 responses. The substatus codes take the form of decimal numbers appended to the 404 status code. The substatus codes are not officially recognized by IANA and are not returned by non-Microsoft servers.

Substatus codes

Microsoft's IIS 7.0, IIS 7.5, and IIS 8.0 servers define the following HTTP substatus codes to indicate a more specific cause of a 404 error:

Custom error pages

The Wikimedia 404 message Wikimedia error 404.png
The Wikimedia 404 message

Web servers can typically be configured to display a customised 404 error page, including a more natural description, the parent site's branding, and sometimes a site map, a search form or 404-page widget. The protocol level phrase, which is hidden from the user, is rarely customized. Internet Explorer, however, will not display custom pages unless they are larger than 512 bytes, opting instead to display a "friendly" error page. [19] Google Chrome included similar functionality, where the 404 is replaced with alternative suggestions generated by Google algorithms, if the page is under 512 bytes in size. [20] Another problem is that if the page does not provide a favicon, and a separate custom 404-page exists, extra traffic and longer loading times will be generated on every page view. [21] [22]

Many organizations use 404 error pages as an opportunity to inject humor into what may otherwise be a serious website. For example, Metro UK shows a polar bear on a skateboard, and the web development agency Left Logic has a simple drawing program. [23] During the 2015 UK general election campaign the main political parties all used their 404 pages to either take aim at political opponents or show relevant policies to potential supporters. [24] In Europe, the NotFound project, created by multiple European organizations including Missing Children Europe and Child Focus, encourages site operators to add a snippet of code to serve customized 404 error pages [25] which provide data about missing children. [26]

While many websites send additional information in a 404 error message—such as a link to the homepage of a website or a search box—some also endeavor to find the correct web page the user wanted. Extensions are available for some content management systems (CMSs) to do this. [27]

Tracking 404 errors

A number of tools exist that crawl through a website to find pages that return 404 status codes. These tools can be helpful in finding links that exist within a particular website. The limitation of these tools is that they only find links within one particular website, and ignore 404s resulting from links on other websites. As a result, these tools miss out on 83% of the 404s on websites. [28] One way around this is to find 404 errors by analyzing external links. [29]

One of the most effective ways to discover 404 errors is by using Google Search Console, Google Analytics or crawling software.

Another common method is tracking traffic to 404 pages using log file analysis. [30] This can be useful to understand more about what 404s users reached on the site. Another method of tracking traffic to 404 pages is using JavaScript-based traffic tracking tools. [31]

See also

Related Research Articles

Active Server Pages (ASP) is Microsoft's first server-side scripting language and engine for dynamic web pages.

<span class="mw-page-title-main">HTTP</span> Application protocol for distributed, collaborative, hypermedia information systems

The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, where hypertext documents include hyperlinks to other resources that the user can easily access, for example by a mouse click or by tapping the screen in a web browser.

<span class="mw-page-title-main">Web browser</span> Software used to access websites

A web browser is an application for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on a range of devices, including desktops, laptops, tablets, and smartphones. In 2020, an estimated 4.9 billion people have used a browser. The most-used browser is Google Chrome, with a 64% global market share on all devices, followed by Safari with 19%.

<span class="mw-page-title-main">Web server</span> Computer software that distributes web pages

A web server is computer software and underlying hardware that accepts requests via HTTP or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initiates communication by making a request for a web page or other resource using HTTP, and the server responds with the content of that resource or an error message. A web server can also accept and store resources sent from the user agent if configured to do so.

<span class="mw-page-title-main">Website</span> Set of related web pages served from a single domain

A website is a collection of web pages and related content that is identified by a common domain name and published on at least one web server. Websites are typically dedicated to a particular topic or purpose, such as news, education, commerce, entertainment, or social media. Hyperlinking between web pages guides the navigation of the site, which often starts with a home page. The most-visited sites are Google, YouTube, and Facebook.

<span class="mw-page-title-main">Proxy server</span> Computer server that makes and receives requests on behalf of a user

In computer networking, a proxy server is a server application that acts as an intermediary between a client requesting a resource and the server providing that resource. It improves privacy, security, and performance in the process.

robots.txt Internet protocol

robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.

<span class="mw-page-title-main">Internet Information Services</span> Extensible web server software by Microsoft

Internet Information Services is an extensible web server created by Microsoft for use with the Windows NT family. IIS supports HTTP, HTTP/2, HTTPS, FTP, FTPS, SMTP and NNTP. It has been an integral part of the Windows NT family since Windows NT 4.0, though it may be absent from some editions, and is not active by default.

Inline linking is the use of a linked object, often an image, on one site by a web page belonging to a second site. One site is said to have an inline link to the other site where the object is located.

<span class="mw-page-title-main">Favicon</span> Icon associated with a particular web site

A favicon, also known as a shortcut icon, website icon, tab icon, URL icon, or bookmark icon, is a file containing one or more small icons associated with a particular website or web page. A web designer can create such an icon and upload it to a website by several means, and graphical web browsers will then make use of it. Browsers that provide favicon support typically display a page's favicon in the browser's address bar and next to the page's name in a list of bookmarks. Browsers that support a tabbed document interface typically show a page's favicon next to the page's title on the tab, and site-specific browsers use the favicon as a desktop icon.

An .htaccess file is a directory-level configuration file supported by several web servers, used for configuration of website-access issues, such as URL redirection, URL shortening, access control, and more. The 'dot' before the file name makes it a hidden file in Unix-based environments.

URL redirection, also called URL forwarding, is a World Wide Web technique for making a web page available under more than one URL address. When a web browser attempts to open a URL that has been redirected, a page with a different URL is opened. Similarly, domain redirection or domain forwarding is when all pages in a URL domain are redirected to a different domain, as when wikipedia.com and wikipedia.net are automatically redirected to wikipedia.org.

A query string is a part of a uniform resource locator (URL) that assigns values to specified parameters. A query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML document, choosing the appearance of a page, or jumping to positions in multimedia content.

URL shortening is a technique on the World Wide Web in which a Uniform Resource Locator (URL) may be made substantially shorter and still direct to the required page. This is achieved by using a redirect which links to the web page that has a long URL. For example, the URL "https://example.com/assets/category_B/subcategory_C/Foo/" can be shortened to "https://example.com/Foo", and the URL "https://en.wikipedia.org/wiki/URL_shortening" can be shortened to "https://w.wiki/U". Often the redirect domain name is shorter than the original one. A friendly URL may be desired for messaging technologies that limit the number of characters in a message, for reducing the amount of typing required if the reader is copying a URL from a print source, for making it easier for a person to remember, or for the intention of a permalink. In November 2009, the shortened links of the URL shortening service Bitly were accessed 2.1 billion times.

<span class="mw-page-title-main">HTTP compression</span> Capability that can be built into web servers and web clients

HTTP compression is a capability that can be built into web servers and web clients to improve transfer speed and bandwidth utilization.

<span class="mw-page-title-main">HTTP 403</span> HTTP status code indicating that access is forbidden to a resource

HTTP 403 is an HTTP status code meaning access to the requested resource is forbidden. The server understood the request, but will not fulfill it, if it was correct.

JSONP, or JSON-P, is a historical JavaScript technique for requesting data by loading a <script> element, which is an element intended to load ordinary JavaScript. It was proposed by Bob Ippolito in 2005. JSONP enables sharing of data bypassing same-origin policy, which disallows running JavaScript code to read media DOM elements or XMLHttpRequest data fetched from outside the page's originating site. The originating site is indicated by a combination of URI scheme, hostname, and port number.

<span class="mw-page-title-main">Helicon Ape</span>

Helicon Ape is a piece of software developed by Helicon Tech to bring Apache functionality to IIS web servers. It executes as an ASP.NET module for IIS 7, integrating the functionalities of over 35 Apache modules. This integration allows for the use of Apache configurations on IIS while maintaining the syntax intact, thereby extending the standard capabilities of IIS.

Cross-site request forgery, also known as one-click attack or session riding and abbreviated as CSRF or XSRF, is a type of malicious exploit of a website or web application where unauthorized commands are submitted from a user that the web application trusts. There are many ways in which a malicious website can transmit such commands; specially-crafted image tags, hidden forms, and JavaScript fetch or XMLHttpRequests, for example, can all work without the user's interaction or even knowledge. Unlike cross-site scripting (XSS), which exploits the trust a user has for a particular site, CSRF exploits the trust that a site has in a user's browser. In a CSRF attack, an innocent end user is tricked by an attacker into submitting a web request that they did not intend. This may cause actions to be performed on the website that can include inadvertent client or server data leakage, change of session state, or manipulation of an end user's account.

References

  1. 1 2 Fielding, R.; Reschke, J. (June 2014). Fielding, R; Reschke, J (eds.). "RFC 7231, HTTP/1.1 Semantics and Content, Section 6.5.4 404 Not Found". ietf.org. doi:10.17487/RFC7231. S2CID   14399078 . Retrieved 13 December 2018.{{cite journal}}: Cite journal requires |journal= (help)
  2. 1 2 3 "What is a 404 error and what should I do if I get one? » Internet » Windows » Tech Ease" . Retrieved 19 May 2023.
  3. "404 page design: best practices and awesome examples". justinmind.com. Retrieved 19 May 2023.{{cite web}}: Cite uses generic title (help)
  4. "Google 404 Error Page". Google.
  5. "Github 404 Error Page". Github.
  6. "LEGO 404 Error Page". Lego.
  7. "Amazon's 404 error page". Amazon.
  8. "Peugeot's 404 error page". Peugeot.
  9. Neylon, Michele (7 August 2011). "RTE's 404 Keeps Bosco Alive". Michele Neylon :: Pensieri. Retrieved 21 December 2022.
  10. "404 Page not found". RTÉ.ie. Retrieved 21 December 2022.{{cite web}}: Cite uses generic title (help)
  11. Ziv Bar-Yossef; Andrei Z. Broder; Ravi Kumar; Andrew Tompkins (2004). "Sic transit gloria telae". Proceedings of the 13th international conference on World Wide Web. pp. 328–337. doi:10.1145/988672.988716. ISBN   978-1581138443. S2CID   587547.
  12. "Why is your crawler asking for strange URLs that have never existed on my site?". Yahoo Ysearch Help page. Archived from the original on 15 July 2014. Retrieved 4 September 2013.
  13. "Farewell to soft 404s". Google Official Blog. Retrieved 20 September 2008.
  14. "LINX Public Affairs » Cleanfeed: the facts". Publicaffairs.linx.net. 10 September 2004. Archived from the original on 13 May 2011. Retrieved 6 March 2011.
  15. "DEMON – Error 403" . Retrieved 14 June 2012.
  16. Sambandaraksa, Don (18 February 2009). "The old fake '404 Not Found' routine - Dead link". Bangkok Post. Retrieved 12 September 2010.
  17. Noman, Helmi (12 September 2008). "Tunisian journalist sues government agency for blocking Facebook, claims damage for the use of 404 error message instead of 403". Open Net Initiative. Retrieved 21 November 2010.
  18. "Anti-censorship movement in Tunisia: creativity, courage and hope!". Global Voices Advocacy. 27 May 2010. Retrieved 28 August 2010.
  19. "Friendly HTTP Error Pages". msdn.com. 18 August 2010. Archived from the original on 2 December 2010. Retrieved 14 June 2012.
  20. "Issue 1695: Chrome needs option to turn off "Friendly 404" displays". bugs.chromium.org. Retrieved 25 December 2021.
  21. Heng, Christopher (7 September 2008). "What is Favicon.ico and How to Create a Favicon Icon for Your Website". The Site Wizard. Retrieved 23 February 2011.
  22. "The Dastardly "favicon.ico not found" Error". Internet Folks. 3 August 1999.
  23. "From skateboarding bears to missing children: The power of the 404 Not Found error page". Metro. 6 June 2011. Retrieved 16 April 2013.
  24. "The political Page 404 war". BBC Newsbeat. 27 April 2015. Retrieved 18 May 2018.
  25. "Notfound.org". notfound. Archived from the original on 2 September 2014.
  26. "Missing children messages go on 404 error pages". BBC News . 27 September 2012. Retrieved 20 September 2014.
  27. Swenson, Sahala (19 August 2008). "Make your 404 pages more useful". Official Google Webmaster Central Blog. Google, Inc. Retrieved 28 August 2009.
  28. "Sources Leading To 404s". SpringTrax. Retrieved 11 February 2013.
  29. Cushing, Anne (2 April 2013). "A Data-Centric Approach To Identifying 404 Pages Worth Saving". Search Engine Land. Retrieved 7 June 2013.
  30. "Tracking and Preventing 404 Errors". 404errorpages.com. Retrieved 7 June 2013.
  31. "Understand 404 Errors". SpringTrax.com. Retrieved 7 June 2013.