A query string is a part of a uniform resource locator (URL) that assigns values to specified parameters. A query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML document, choosing the appearance of a page, or jumping to positions in multimedia content.
A web server can handle a Hypertext Transfer Protocol (HTTP) request either by reading a file from its file system based on the URL path or by handling the request using logic that is specific to the type of resource. In cases where special logic is invoked, the query string will be available to that logic for use in its processing, along with the path component of the URL.
A typical URL containing a query string is as follows:
https://example.com/over/there?name=ferret
When a server receives a request for such a page, it may run a program, passing the query string, which in this case is name=ferret
, unchanged to the program. The question mark is used as a separator, and is not part of the query string. [1] [2]
Web frameworks may provide methods for parsing multiple parameters in the query string, separated by some delimiter. [3] In the example URL below, multiple query parameters are separated by the ampersand, "&
":
https://example.com/path/to/page?name=ferret&color=purple
The exact structure of the query string is not standardized. Methods used to parse the query string may differ between websites.
A link in a web page may have a URL that contains a query string. HTML defines three ways a user agent can generate the query string:
<form>...</form>
elementismap
attribute on the <img>
element with an <imgismap>
construction<isindex>
elementOne of the original uses was to contain the content of an HTML form, also known as web form. In particular, when a form containing the fields field1
, field2
, field3
is submitted, the content of the fields is encoded as a query string as follows:
field1=value1&field2=value2&field3=value3...
=
".&
" (semicolons ";
" are not recommended by the W3C anymore, see below).While there is no definitive standard, most web frameworks allow multiple values to be associated with a single field (e.g. field1=value1&field1=value2&field2=value3
). [4] [5]
For each field of the form, the query string contains a pair field=value
. Web forms may include fields that are not visible to the user; these fields are included in the query string when the form is submitted.
This convention is a W3C recommendation. [3] In the recommendations of 1999, W3C recommended that all web servers support semicolon separators in addition to ampersand separators [6] to allow application/x-www-form-urlencoded query strings in URLs within HTML documents without having to entity escape ampersands. Since 2014, W3C recommends to use only ampersand as query separator. [7]
The form content is only encoded in the URL's query string when the form submission method is GET. The same encoding is used by default when the submission method is POST, but the result is submitted as the HTTP request body rather than being included in a modified URL. [8]
Before forms were added to HTML, browsers rendered the –<isindex>
element as a single-line text-input control. The text entered into this control was sent to the server as a query string addition to a GET request for the base URL or another URL specified by the action
attribute. [9] This was intended to allow web servers to use the provided text as query criteria so they could return a list of matching pages. [10]
When the text input into the indexed search control is submitted, it is encoded as a query string as follows:
argument1+argument2+argument3...
+
'.Though the <isindex>
element is deprecated and most browsers no longer support or render it, there are still some vestiges of indexed search in existence. For example, this is the source of the special handling of plus sign, '+
' within browser URL percent encoding (which today, with the deprecation of indexed search, is all but redundant with %20
). Also some web servers supporting CGI (e.g., Apache) will process the query string into command line arguments if it does not contain an equals sign, '=
' (as per section 4.4 of CGI 1.1). Some CGI scripts still depend on and use this historic behavior for URLs embedded in HTML.
Some characters cannot be part of a URL (for example, the space) and some other characters have a special meaning in a URL: for example, the character #
can be used to further specify a subsection (or fragment) of a document. In HTML forms, the character =
is used to separate a name from a value. The URI generic syntax uses URL encoding to deal with this problem, while HTML forms make some additional substitutions rather than applying percent encoding for all such characters. SPACE is encoded as '+
' or "%20
". [11]
HTML 5 specifies the following transformation for submitting HTML forms with the "GET" method to a web server. The following is a brief summary of the algorithm:
+
' or '%20
'A
–Z
and a
–z
), numbers (0
–9
) and the characters '~
','-
','.
' and '_
' are left as-is+
is encoded by %2B%HH
hexadecimal representation with any non-ASCII characters first encoded as UTF-8 (or other specified encoding)The octet corresponding to the tilde ("~
") is permitted in query strings by RFC3986 but required to be percent-encoded in HTML forms to "%7E
".
The encoding of SPACE as '+
' and the selection of "as-is" characters distinguishes this encoding from RFC 3986.
If a form is embedded in an HTML page as follows:
<formaction="/cgi-bin/test.cgi"method="get"><inputtype="text"name="first"/><inputtype="text"name="second"/><inputtype="submit"/></form>
and the user inserts the strings "this is a field" and "was it clear (already)?" in the two text fields and presses the submit button, the program test.cgi
(the program specified by the action
attribute of the form
element in the above example) will receive the following query string: first=this+is+a+field&second=was+it+clear+%28already%29%3F
.
If the form is processed on the server by a CGI script, the script may typically receive the query string as an environment variable named QUERY_STRING
.
A program receiving a query string can ignore part or all of it. If the requested URL corresponds to a file and not to a program, the whole query string is ignored. However, regardless of whether the query string is used or not, the whole URL including it is stored in the server log files.
These facts allow query strings to be used to track users in a manner similar to that provided by HTTP cookies. For this to work, every time the user downloads a page, a unique identifier must be chosen and added as a query string to the URLs of all links the page contains. As soon as the user follows one of these links, the corresponding URL is requested to the server. This way, the download of this page is linked with the previous one.
For example, when a web page containing the following is requested:
<ahref="foo.html">see my page!</a><ahref="bar.html">mine is better</a>
a unique string, such as e0a72cb2a2c7
is chosen, and the page is modified as follows:
<ahref="foo.html?e0a72cb2a2c7">see my page!</a><ahref="bar.html?e0a72cb2a2c7">mine is better</a>
The addition of the query string does not change the way the page is shown to the user. When the user follows, for example, the first link, the browser requests the page foo.html?e0a72cb2a2c7
to the server, which ignores what follows ?
and sends the page foo.html
as expected, adding the query string to its links as well.
This way, any subsequent page request from this user will carry the same query string e0a72cb2a2c7
, making it possible to establish that all these pages have been viewed by the same user. Query strings are often used in association with web beacons.
The main differences between query strings used for tracking and HTTP cookies are that:
According to the HTTP specification:
Various ad hoc limitations on request-line length are found in practice. It is RECOMMENDED that all HTTP senders and recipients support, at a minimum, request-line lengths of 8000 octets. [13]
If the URL is too long, the web server fails with the 414 Request-URI Too Long HTTP status code.
The common workaround for these problems is to use POST instead of GET and store the parameters in the request body. The length limits on request bodies are typically much higher than those on URL length. For example, the limit on POST size, by default, is 2 MB on IIS 4.0 and 128 KB on IIS 5.0. The limit is configurable on Apache2 using the LimitRequestBody
directive, which specifies the number of bytes from 0 (meaning unlimited) to 2147483647 (2 GB) that are allowed in a request body. [14]
In computing, Common Gateway Interface (CGI) is an interface specification that enables web servers to execute an external program to process HTTP or HTTPS user requests.
HTTP is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, where hypertext documents include hyperlinks to other resources that the user can easily access, for example by a mouse click or by tapping the screen in a web browser.
A Uniform Resource Identifier (URI), formerly Universal Resource Identifier, is a unique sequence of characters that identifies an abstract or physical resource, such as resources on a webpage, mail address, phone number, books, real-world objects such as people and places, concepts. URIs are used to identify anything described using the Resource Description Framework (RDF), for example, concepts that are part of an ontology defined using the Web Ontology Language (OWL), and people who are described using the Friend of a Friend vocabulary would each have an individual URI.
The World Wide Web is an information system that enables content sharing over the Internet through user-friendly ways meant to appeal to users beyond IT specialists and hobbyists. It allows documents and other web resources to be accessed over the Internet according to specific rules of the Hypertext Transfer Protocol (HTTP).
A bookmarklet is a bookmark stored in a web browser that contains JavaScript commands that add new features to the browser. They are stored as the URL of a bookmark in a web browser or as a hyperlink on a web page. Bookmarklets are usually small snippets of JavaScript executed when user clicks on them. When clicked, bookmarklets can perform a wide variety of operations, such as running a search query from selected text or extracting data from a table.
In computing, the User-Agent header is an HTTP header intended to identify the user agent responsible for making a given HTTP request. Whereas the character sequence User-Agent
comprises the name of the header itself, the header value that a given user agent uses to identify itself is colloquially known as its user agent string. The user agent for the operator of a computer used to access the Web has encoded within the rules that govern its behavior the knowledge of how to negotiate its half of a request-response transaction; the user agent thus plays the role of the client in a client–server system. Often considered useful in networks is the ability to identify and distinguish the software facilitating a network session. For this reason, the User-Agent HTTP header exists to identify the client software to the responding server.
URL redirection, also called URL forwarding, is a World Wide Web technique for making a web page available under more than one URL address. When a web browser attempts to open a URL that has been redirected, a page with a different URL is opened. Similarly, domain redirection or domain forwarding is when all pages in a URL domain are redirected to a different domain, as when wikipedia.com and wikipedia.net are automatically redirected to wikipedia.org.
Code injection is a class of computer security exploits in which vulnerable computer programs or system processes fail to correctly handle external data, such as user input, leading to the program misinterpreting the data as a command that should be executed. An attacker using this method "injects" code into the program while it is running. Successful exploitation of a code injection vulnerability can result in data breaches, access to restricted or critical computer systems and the spread of malware.
In the context of an HTTP transaction, basic access authentication is a method for an HTTP user agent to provide a user name and password when making a request. In basic HTTP authentication, a request contains a header field in the form of Authorization: Basic <credentials>
, where <credentials>
is the Base64 encoding of ID and password joined by a single colon :
.
In the context of a web browser, a frame is a part of a web page or browser window which displays content independent of its container, with the ability to load content independently. The HTML or media elements in a frame may come from a web site distinct from the site providing the enclosing content. This practice, known as framing, is today often regarded as a violation of same-origin policy.
The data URI scheme is a uniform resource identifier (URI) scheme that provides a way to include data in-line in Web pages as if they were external resources. It is a form of file literal or here document. This technique allows normally separate elements such as images and style sheets to be fetched in a single Hypertext Transfer Protocol (HTTP) request, which may be more efficient than multiple HTTP requests, and used by several browser extensions to package images as well as other multimedia content in a single HTML file for page saving. As of 2024, data URIs are fully supported by all major browsers.
In computer hypertext, a URI fragment is a string of characters that refers to a resource that is subordinate to another, primary resource. The primary resource is identified by a Uniform Resource Identifier (URI), and the fragment identifier points to the subordinate resource.
URL encoding, officially known as percent-encoding, is a method to encode arbitrary data in a uniform resource identifier (URI) using only the US-ASCII characters legal within a URI. Although it is known as URL encoding, it is also used more generally within the main Uniform Resource Identifier (URI) set, which includes both Uniform Resource Locator (URL) and Uniform Resource Name (URN). Consequently, it is also used in the preparation of data of the application/x-www-form-urlencoded
media type, as is often used in the submission of HTML form data in HTTP requests.
A web API is an application programming interface (API) for either a web server or a web browser. As a web development concept, it can be related to a web application's client side. A server-side web API consists of one or more publicly exposed endpoints to a defined request–response message system, typically expressed in JSON or XML by means of an HTTP-based web server. A server API (SAPI) is not considered a server-side web API, unless it is publicly accessible by a remote web application.
A webform, web form or HTML form on a web page allows a user to enter data that is sent to a server for processing. Forms can resemble paper or database forms because web users fill out the forms using checkboxes, radio buttons, or text fields. For example, forms can be used to enter shipping or credit card data to order a product, or can be used to retrieve search results from a search engine.
HTTP cookies are small blocks of data created by a web server while a user is browsing a website and placed on the user's computer or other device by the user's web browser. Cookies are placed on the device used to access a website, and more than one cookie may be placed on a user's device during a session.
In HTTP, "Referer" is an optional HTTP header field that identifies the address of the web page from which the resource has been requested. By checking the referrer, the server providing the new web page can see where the request originated.
Clean URLs are web addresses or Uniform Resource Locators (URLs) intended to improve the usability and accessibility of a website, web application, or web service by being immediately and intuitively meaningful to non-expert users. Such URL schemes tend to reflect the conceptual structure of a collection of information and decouple the user interface from a server's internal representation of information. Other reasons for using clean URLs include search engine optimization (SEO), conforming to the representational state transfer (REST) style of software architecture, and ensuring that individual web resources remain consistently at the same URL. This makes the World Wide Web a more stable and useful system, and allows more durable and reliable bookmarking of web resources.
In computing, POST is a request method supported by HTTP used by the World Wide Web. By design, the POST request method requests that a web server accepts the data enclosed in the body of the request message, most likely for storing it. It is often used when uploading a file or when submitting a completed web form.
A uniform resource locator (URL), colloquially known as an address on the Web, is a reference to a resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifier (URI), although many people use the two terms interchangeably. URLs occur most commonly to reference web pages (HTTP/HTTPS) but are also used for file transfer (FTP), email (mailto), database access (JDBC), and many other applications.