URL redirection, also called URL forwarding, is a World Wide Web technique for making a web page available under more than one URL address. When a web browser attempts to open a URL that has been redirected, a page with a different URL is opened. Similarly, domain redirection or domain forwarding is when all pages in a URL domain are redirected to a different domain, as when wikipedia.com and wikipedia.net are automatically redirected to wikipedia.org.
URL redirection is done for various reasons:
There are several reasons to use URL redirection:
A website may potentially be accessible over both a secure HTTPS URI scheme and plain HTTP (an insecure URI beginning with "http://").
If a user types in a URI or clicks on a link that refers to the insecure variant, the browser will automatically redirect to the secure version in case the website is contained in the HSTS preload list shipped with the application or if the user had already visited the origin in the past.
Otherwise the website will be contacted over HTTP. A website operator may decide to serve such requests by redirecting the browser to the HTTPS variant instead and hopefully also priming HSTS for future accesses.
A user might mistype a URL. Organizations often register these misspelled domains and redirect them to the intended location. This technique is often used to "reserve" other top-level domains (TLD) with the same name, or make it easier for a ".edu" or ".net" site to accommodate users who type ".com".
Web pages may be redirected to a new domain for three reasons:
With URL redirects, incoming links to an outdated URL can be sent to the correct location. These links might be from other sites that have not realized that there is a change or from bookmarks/favorites that users have saved in their browsers. The same applies to search engines. They often have the older/outdated domain names and links in their database and will send search users to these old URLs. By using a "moved permanently" redirect to the new URL, visitors will still end up at the correct page. Also, in the next search engine pass, the search engine should detect and use the newer URL.
The access logs of most web servers keep detailed information about where visitors came from and how they browsed the hosted site. They do not, however, log which links visitors left by. This is because the visitor's browser has no need to communicate with the original server when the visitor clicks on an outgoing link. This information can be captured in several ways. One way involves URL redirection. Instead of sending the visitor straight to the other site, links on the site can direct to a URL on the original website's domain that automatically redirects to the real target. This technique bears the downside of the delay caused by the additional request to the original website's server. As this added request will leave a trace in the server log, revealing exactly which link was followed, it can also be a privacy issue. [1] The same technique is also used by some corporate websites to implement a statement that the subsequent content is at another site, and therefore not necessarily affiliated with the corporation. In such scenarios, displaying the warning causes an additional delay.
Web applications often include lengthy descriptive attributes in their URLs which represent data hierarchies, command structures, transaction paths and session information. This practice results in a URL that is aesthetically unpleasant and difficult to remember, and which may not fit within the size limitations of microblogging sites. URL shortening services provide a solution to this problem by redirecting a user to a longer URL from a shorter one. [1]
Sometimes the URL of a page changes even though the content stays the same. Therefore, URL redirection can help users who have bookmarks. This is routinely done on Wikipedia whenever a page is renamed.
Post/Redirect/Get (PRG) is a web development design pattern that prevents some duplicate form submissions if the user clicks the refresh button after submitting the form, creating a more intuitive interface for user agents (users).
Redirects can be effectively used for targeting purposes like geotargeting. Device targeting has become increasingly important with the rise of mobile clients. There are two approaches to serve mobile users: Make the website responsive or redirect to a mobile website version. If a mobile website version is offered, users with mobile clients will be automatically forwarded to the corresponding mobile content. For device targeting, client-side redirects or non-cacheable server-side redirects are used. Geotargeting is the approach to offer localized content and automatically forward the user to a localized version of the requested URL. This is helpful for websites that target audience in more than one location and/or language. Usually server-side redirects are used for Geotargeting but client-side redirects might be an option as well, depending on requirements. [2]
Redirects have been used to manipulate search engines with unethical intentions, e.g., URL hijacking. The goal of misleading redirects is to drive search traffic to landing pages, which do not have enough ranking power on their own or which are only remotely or not at all related to the search target. The approach requires a rank for a range of search terms with a number of URLs that would utilize sneaky redirects to forward the searcher to the target page. This method had a revival with the uprise of mobile devices and device targeting. URL hijacking is an off-domain redirect technique [3] that exploited the nature of the search engine's handling for temporary redirects. If a temporary redirect is encountered, search engines have to decide whether they assign the ranking value to the URL that initializes the redirect or to the redirect target URL. The URL that initiates the redirect may be kept to show up in search results, as the redirect indicates a temporary nature. Under certain circumstances it was possible to exploit this behavior by applying temporary redirects to well-ranking URLs, leading to a replacement of the original URL in search results by the URL that initialized the redirect, therefore "stealing" the ranking. This method was usually combined with sneaky redirects to re-target the user stream from the search results to a target page. Search engines have developed efficient technologies to detect these kinds of manipulative approaches. Major search engines usually apply harsh ranking penalties on sites that get caught applying techniques like these. [4]
URL redirection is sometimes used as a part of phishing attacks that confuse visitors about which web site they are visiting. [5] Because modern browsers always show the real URL in the address bar, the threat is lessened. However, redirects can also take you to sites that will otherwise attempt to attack in other ways. For example, a redirect might take a user to a site that would attempt to trick them into downloading antivirus software and installing a Trojan of some sort instead.
referrer
informationWhen a link is clicked, the browser sends along in the HTTP request a field called referer which indicates the source of the link. This field is populated with the URL of the current web page, and will end up in the logs of the server serving the external link. Since sensitive pages may have sensitive URLs (for example, https://company.com/plans-for-the-next-release-of-our-product
), it is not desirable for the referrer
URL to leave the organization. A redirection page that performs referrer hiding could be embedded in all external URLs, transforming for example https://externalsite.com/page
into https://redirect.company.com/https://externalsite.com/page
. This technique also eliminates other potentially sensitive information from the referrer URL, such as the session ID, and can reduce the chance of phishing by indicating to the end user that they passed a clear gateway to another site.
Several different kinds of response to the browser will result in a redirection. These vary in whether they affect HTTP headers or HTML content. The techniques used typically depend on the role of the person implementing it and their access to different parts of the system. For example, a web author with no control over the headers might use a Refresh meta tag whereas a web server administrator redirecting all pages on a site is more likely to use server configuration.
The simplest technique is to ask the visitor to follow a link to the new page, usually using an HTML anchor like:
Please follow <ahref="https://www.example.com/">this link</a>.
This method is often used as a fall-back — if the browser does not support the automatic redirect, the visitor can still reach the target document by following the link.
In the HTTP protocol used by the World Wide Web, a redirect is a response with a status code beginning with 3 that causes a browser to display a different page. If a client encounters a redirect, it needs to make a number of decisions how to handle the redirect. Different status codes are used by clients to understand the purpose of the redirect, how to handle caching and which request method to use for the subsequent request.
HTTP/1.1 defines several status codes for redirection (RFC 7231):
Status codes 304 not modified and 305 use proxy are not redirects.
HTTP Status Code | HTTP Version | Temporary / Permanent | Cacheable | Request Method Subsequent Request |
---|---|---|---|---|
301 | HTTP/1.0 | Permanent | Yes | GET / POST may change |
302 | HTTP/1.0 | Temporary | not by default | GET / POST may change |
303 | HTTP/1.1 | Temporary | never | always GET |
307 | HTTP/1.1 | Temporary | not by default | may not change |
308 | HTTP/1.1 | Permanent | by default | may not change |
All of these status codes require the URL of the redirect target to be given in the Location: header of the HTTP response. The 300 multiple choices will usually list all choices in the body of the message and show the default choice in the Location: header.
A HTTP response with the 301 "moved permanently" redirect looks like this:
HTTP/1.1301Moved PermanentlyLocation:https://www.example.org/Content-Type:text/htmlContent-Length:174<html><head><title>Moved</title></head><body> =Moved= <p>This page has moved to <ahref="https://www.example.org/">https://www.example.org/</a>.</p></body></html>
Web authors producing HTML content can't usually create redirects using HTTP headers as these are generated automatically by the web server program when serving an HTML file. The same is usually true even for programmers writing CGI scripts, though some servers allow scripts to add custom headers (e.g. by enabling "non-parsed-headers"). Many web servers will generate a 3xx status code if a script outputs a "Location:" header line. For example, in PHP, one can use the "header" function:
header('HTTP/1.1 301 Moved Permanently');header('Location: https://www.example.com/');exit();
More headers may be required to prevent caching. [7] The programmer must ensure that the headers are output before the body. This may not fit easily with the natural flow of control through the code. To help with this, some frameworks for server-side content generation can buffer the body data. In the ASP scripting language, this can also be accomplished using response.buffer=true
and response.redirect "https://www.example.com/"
HTTP/1.1 allows for either a relative URI reference or an absolute URI reference. [8] If the URI reference is relative the client computes the required absolute URI reference according to the rules defined in RFC 3986. [9]
The Apache HTTP Server mod_alias extension can be used to redirect certain requests. Typical configuration directives look like:
Redirectpermanent/oldpage.htmlhttps://www.example.com/newpage.html Redirect301/oldpage.htmlhttps://www.example.com/newpage.html
For more flexible URL rewriting and redirection, Apache mod_rewrite can be used. E.g., to redirect a requests to a canonical domain name:
RewriteEngineonRewriteCond%{HTTP_HOST}^([^.:]+\.)*oldsite\.example\.com\.?(:[0-9]*)?$[NC] RewriteRule^(.*)$https://newsite.example.net/$1[R=301,L]
Such configuration can be applied to one or all sites on the server through the server configuration files or to a single content directory through a .htaccess
file.
Nginx has an integrated http rewrite module, [10] which can be used to perform advanced URL processing and even web-page generation (with the return
directive). An example of such advanced use of the rewrite module is mdoc.su, which implements a deterministic URL shortening service entirely with the help of nginx configuration language alone. [11] [12]
For example, if a request for /DragonFlyBSD/HAMMER.5
were to come along, it would first be redirected internally to /d/HAMMER.5
with the first rewrite directive below (only affecting the internal state, without any HTTP replies issued to the client just yet), and then with the second rewrite directive, an HTTP response with a 302 Found status code would be issued to the client to actually redirect to the external cgi script of web-man: [13]
location/DragonFly{rewrite^/DragonFly(BSD)?([,/].*)?$/d$2last;}location/d{set$db"https://leaf.dragonflybsd.org/cgi/web-man?command=";set$ds"§ion=";rewrite^/./([^/]+)\.([1-9])$$db$1$ds$2redirect;}
Netscape introduced the meta refresh feature which refreshes a page after a certain amount of time. This can specify a new URL to replace one page with another. This is supported by most web browsers. [14] [15] A timeout of zero seconds effects an immediate redirect. This is treated like a 301 permanent redirect by Google, allowing transfer of PageRank to the target page. [16]
This is an example of a simple HTML document that uses this technique:
<html><head><metahttp-equiv="Refresh"content="0; url=https://www.example.com/"/></head><body><p>Please follow <ahref="https://www.example.com/">this link</a>.</p></body></html>
This technique can be used by web authors because the meta tag is contained inside the document itself. The meta tag must be placed in the "head" section of the HTML file. The number "0" in this example may be replaced by another number to achieve a delay of that many seconds. The anchor in the "body" section is for users whose browsers do not support this feature.
The same effect can be achieved with an HTTP refresh
header:
HTTP/1.1200OKRefresh:0; url=https://www.example.com/Content-Type:text/htmlContent-Length:78 Please follow <ahref="https://www.example.com/">this link</a>.
This response is easier to generate by CGI programs because one does not need to change the default status code.
Here is a simple CGI program that effects this redirect:
# !/usr/bin/perlprint"Refresh: 0; url=https://www.example.com/\r\n";print"Content-Type: text/html\r\n";print"\r\n";print"Please follow <a href=\"https://www.example.com/\">this link</a>!"
Note: Usually, the HTTP server adds the status line and the Content-Length header automatically.
The W3C discourage the use of meta refresh, since it does not communicate any information about either the original or new resource, to the browser (or search engine). The W3C's Web Content Accessibility Guidelines (7.4) [17] discourage the creation of auto-refreshing pages, since most web browsers do not allow the user to disable or control the refresh rate. Some articles that they have written on the issue include W3C Web Content Accessibility Guidelines (1.0): Ensure user control of time-sensitive content changes, Use standard redirects: don't break the back button! [18] and Core Techniques for Web Content Accessibility Guidelines 1.0 section 7. [19]
JavaScript can cause a redirect by setting the window.location
attribute, e.g.:
window.location='https://www.example.com/'
Normally JavaScript pushes the redirector site's URL to the browser's history. It can cause redirect loops when users hit the back button. With the following command you can prevent this type of behaviour. [20]
window.location.replace('https://www.example.com/')
However, HTTP headers or the refresh meta tag may be preferred for security reasons and because JavaScript will not be executed by some browsers and many web crawlers.
A slightly different effect can be achieved by creating an inline frame:
<iframeheight="100%"width="100%"src="https://www.example.com/"> Please follow <ahref="https://www.example.com/">link</a>. </iframe>
One main difference to the above redirect methods is that for a frame redirect, the browser displays the URL of the frame document and not the URL of the target page in the URL bar. This cloaking technique may be used so that the reader sees a more memorable URL or to fraudulently conceal a phishing site as part of website spoofing. [21]
Before HTML5, [22] the same effect could be done with an HTML frame that contains the target page:
<framesetrows="100%"><framesrc="https://www.example.com/"><noframes><body>Please follow <ahref="https://www.example.com/">link</a>.</body></noframes></frameset>
One redirect may lead to another in a redirect chain. If a redirect leads to another redirect, this may also be known as a double redirect. [23] For example, the URL "https://wikipedia.com" (with "*.com" as domain) is first redirected to https://www.wikipedia.org/ (with domain name in .org), where you can navigate to the language-specific site. This is unavoidable if the different links in the chain are served by different servers though it should be minimised by rewriting the URL as much as possible on the server before returning it to the browser as a redirect.
Sometimes a mistake can cause a page to end up redirecting back to itself, possibly via other pages, leading to an infinite sequence of redirects. Browsers should stop redirecting after a certain number of hops and display an error message.
The HTTP/1.1 Standard states: [24]
A client SHOULD detect and intervene in cyclical redirections (i.e., "infinite" redirection loops).
Note: An earlier version of this specification recommended a maximum of five redirections ([RFC 2068], Section 10.3). Content developers need to be aware that some clients might implement such a fixed limitation.
There exist services that can perform URL redirection on demand, with no need for technical work or access to the web server your site is hosted on.
A redirect service is an information management system, which provides an internet link that redirects users to the desired content. The typical benefit to the user is the use of a memorable domain name, and a reduction in the length of the URL or web address. A redirecting link can also be used as a permanent address for content that frequently changes hosts, similarly to the Domain Name System. Hyperlinks involving URL redirection services are frequently used in spam messages directed at blogs and wikis. Thus, one way to reduce spam is to reject all edits and comments containing hyperlinks to known URL redirection services; however, this will also remove legitimate edits and comments and may not be an effective method to reduce spam. Recently, URL redirection services have taken to using AJAX as an efficient, user friendly method for creating shortened URLs. A major drawback of some URL redirection services is the use of delay pages, or frame based advertising, to generate revenue.
The first redirect services took advantage of top-level domains (TLD) such as ".to" (Tonga), ".at" (Austria) and ".is" (Iceland). Their goal was to make memorable URLs. The first mainstream redirect service was V3.com that boasted 4 million users at its peak in 2000. V3.com success was attributed to having a wide variety of short memorable domains including "r.im", "go.to", "i.am", "come.to" and "start.at". V3.com was acquired by FortuneCity.com, a large free web hosting company, in early 1999. [25] As the sales price of top level domains started falling from $50.00 per year to less than $10.00, use of redirection services declined. With the launch of TinyURL in 2002 a new kind of redirecting service was born, namely URL shortening. Their goal was to make long URLs short, to be able to post them on internet forums. Since 2006, with the 140 character limit on the extremely popular Twitter service, these short URL services have been heavily used.
Redirection services can hide the referrer by placing an intermediate page between the page the link is on and its destination. Although these are conceptually similar to other URL redirection services, they serve a different purpose, and they rarely attempt to shorten or obfuscate the destination URL (as their only intended side-effect is to hide referrer information and provide a clear gateway between other websites.) This type of redirection is often used to prevent potentially-malicious links from gaining information using the referrer, for example a session ID in the query string. Many large community websites use link redirection on external links to lessen the chance of an exploit that could be used to steal account information, as well as make it clear when a user is leaving a service, to lessen the chance of effective phishing .
Here is a simplistic example of such a service, written in PHP.
<?php$url=htmlspecialchars($_GET['url']);header('Refresh: 0; url=https://'.$url);?><!-- Fallback using meta refresh. --><html><head><title>Redirecting...</title><metahttp-equiv="refresh"content="0;url=https://<?=$url;?>"></head><body> Attempting to redirect to <ahref="https://<?=$url;?>">https://<?=$url;?></a>. </body></html>
The above example does not check who called it (e.g. by referrer, although that could be spoofed). Also, it does not check the URL provided. This means that a malicious person could link to the redirection page using a URL parameter of his/her own selection, from any page, which uses the web server's resources.
URL redirection can be abused by attackers to perform phishing attacks. If a redirect target is not sufficiently validated by a web application, an attacker can make a web application redirect to an arbitrary website. This vulnerability is known as an open-redirect vulnerability. [26] [27] In certain cases when an open redirect occurs as part of an authentication flow, the vulnerability is known as a covert redirect. [28] [29] When a covert redirect occurs, the attacker website can steal authentication information from the victim website. [26] Open redirect vulnerabilities are fairly common on the web. In June 2022, TechRadar found over 25 active examples of open redirect vulnerabilities on the web, including sites like Google and Instagram. [30] Open redirects have their own CWE identifier, CWE-601. [31]
URL redirection also provides a mechanism to perform cross-site leak attacks. By timing how long a website took to return a particular page or by differentiating one destination page from another, an attacker can gain significant information about another website's state. In 2021, Knittel et al. discovered a vulnerability in the Chrome's Performance API implementation which allowed them to reliably detect cross-origin redirects. [32]
HTTP is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, where hypertext documents include hyperlinks to other resources that the user can easily access, for example by a mouse click or by tapping the screen in a web browser.
Meta elements are tags used in HTML and XHTML documents to provide structured metadata about a Web page. They are part of a web page's head
section. Multiple Meta elements with different attributes can be used on the same page. Meta elements can be used to specify page description, keywords and any other metadata not provided through the other head
elements and attributes.
The World Wide Web is an information system that enables content sharing over the Internet through user-friendly ways meant to appeal to users beyond IT specialists and hobbyists. It allows documents and other web resources to be accessed over the Internet according to specific rules of the Hypertext Transfer Protocol (HTTP).
A web server is computer software and underlying hardware that accepts requests via HTTP or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initiates communication by making a request for a web page or other resource using HTTP, and the server responds with the content of that resource or an error message. A web server can also accept and store resources sent from the user agent if configured to do so.
In computing, a hyperlink, or simply a link, is a digital reference to data that the user can follow or be guided to by clicking or tapping. A hyperlink points to a whole document or to a specific element within a document. Hypertext is text with hyperlinks. The text that is linked from is known as anchor text. A software system that is used for viewing and creating hypertext is a hypertext system, and to create a hyperlink is to hyperlink. A user following hyperlinks is said to navigate or browse the hypertext.
In computer networking, a proxy server is a server application that acts as an intermediary between a client requesting a resource and the server providing that resource. It improves privacy, security, and possibly performance in the process.
In web applications, a rewrite engine is a software component that performs rewriting on URLs, modifying their appearance. This modification is called URL rewriting. It is a way of implementing URL mapping or routing within a web application. The engine is typically a component of a web server or web application framework. Rewritten URLs are used to provide shorter and more relevant-looking links to web pages. The technique adds a layer of abstraction between the files used to generate a web page and the URL that is presented to the outside world.
Inline linking is the use of a linked object, often an image, on one site by a web page belonging to a second site. One site is said to have an inline link to the other site where the object is located.
A query string is a part of a uniform resource locator (URL) that assigns values to specified parameters. A query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML document, choosing the appearance of a page, or jumping to positions in multimedia content.
In the context of a web browser, a frame is a part of a web page or browser window which displays content independent of its container, with the ability to load content independently. The HTML or media elements in a frame may come from a web site distinct from the site providing the enclosing content. This practice, known as framing, is today often regarded as a violation of same-origin policy.
Meta refresh is a method of instructing a web browser to automatically refresh the current web page or frame after a given time interval, using an HTML meta
element with the http-equiv
parameter set to "refresh
" and a content
parameter giving the time interval in seconds. It is also possible to instruct the browser to fetch a different URL when the page is refreshed, by including the alternative URL in the content
parameter. By setting the refresh time interval to zero, meta refresh can be used as a method of URL redirection.
In computing, the same-origin policy (SOP) is a concept in the web-app application security model. Under the policy, a web browser permits scripts contained in a first web page to access data in a second web page, but only if both web pages have the same origin. An origin is defined as a combination of URI scheme, host name, and port number. This policy prevents a malicious script on one page from obtaining access to sensitive data on another web page through that page's DOM.
In HTTP, "Referer" is an optional HTTP header field that identifies the address of the web page from which the resource has been requested. By checking the referrer, the server providing the new web page can see where the request originated.
On the World Wide Web, HTTP 301 is the HTTP response status code for 301 Moved Permanently. It is used for permanent redirecting, meaning that links or records returning this response should be updated. The new URL should be provided in the Location field, included with the response. The 301 redirect is considered a best practice for upgrading users from HTTP to HTTPS.
The HTTP response status code 302 Found is a common way of performing URL redirection. The HTTP/1.0 specification initially defined this code, and gave it the description phrase "Moved Temporarily" rather than "Found".
A single-page application (SPA) is a web application or website that interacts with the user by dynamically rewriting the current web page with new data from the web server, instead of the default method of loading entire new pages. The goal is faster transitions that make the website feel more like a native app.
The HTTP Location header field is returned in responses from an HTTP server under two circumstances:
A canonical link element is an HTML element that helps webmasters prevent duplicate content issues in search engine optimization by specifying the "canonical" or "preferred" version of a web page. It is described in RFC 6596, which went live in April 2012.
Cross-site request forgery, also known as one-click attack or session riding and abbreviated as CSRF or XSRF, is a type of malicious exploit of a website or web application where unauthorized commands are submitted from a user that the web application trusts. There are many ways in which a malicious website can transmit such commands; specially-crafted image tags, hidden forms, and JavaScript fetch or XMLHttpRequests, for example, can all work without the user's interaction or even knowledge. Unlike cross-site scripting (XSS), which exploits the trust a user has for a particular site, CSRF exploits the trust that a site has in a user's browser. In a CSRF attack, an innocent end user is tricked by an attacker into submitting a web request that they did not intend. This may cause actions to be performed on the website that can include inadvertent client or server data leakage, change of session state, or manipulation of an end user's account.
Content Security Policy (CSP) is a computer security standard introduced to prevent cross-site scripting (XSS), clickjacking and other code injection attacks resulting from execution of malicious content in the trusted web page context. It is a Candidate Recommendation of the W3C working group on Web Application Security, widely supported by modern web browsers. CSP provides a standard method for website owners to declare approved origins of content that browsers should be allowed to load on that website—covered types are JavaScript, CSS, HTML frames, web workers, fonts, images, embeddable objects such as Java applets, ActiveX, audio and video files, and other HTML5 features.
{{cite web}}
: CS1 maint: bot: original URL status unknown (link)