Double encoding

Last updated October 04, 2022

Double encoding is the act of encoding data twice in a row using the same encoding scheme. It is usually used as an attack technique to bypass authorization schemes or security filters that intercept user input. In double encoding attacks against security filters, characters of the payload that are treated as illegal by those filters are replaced with their double-encoded form.

Description

In double encoding, data is encoded twice in a row using the same encoding scheme, that is, double-encoded form of data X is Encode(Encode(X)) where Encode is an encoding function.^[1]

Double encoding is usually used as an attack technique to bypass authorization schemes or security filters that intercept user input.^[2] In double encoding attacks against security filters, characters of the payload that are treated as illegal by those filters are replaced with their double-encoded form.^[3] Security filters might treat data X and its encoded form as illegal.^[4] However, it is still possible for Encode(Encode(X)), which is the double-encoded form of data X, to not to be treated as illegal by security filters and hence pass through them, but later on, the target system might use the double-decoded form of Encode(Encode(X)), which is X, something that the filters would have been treated as illegal.^[5]

Double URI-encoding

Double URI-encoding, also referred to as double percent-encoding, is a special type of double encoding in which data is URI-encoded twice in a row.^[6] In other words, double-URI-encoded form of data X is URI-encode(URI-encode(X)).^[7] For example for calculating double-URI-encoded form of <, first < is URI-encoded as %3C which then in turn is URI-encoded as %253C, that is, double-URI-encode(<) = URI-encode(URI-encode(<)) = URI-encode(%3C) = %253C.^[8] As another example, for calculating double-URI-encoded form of ../, first ../ is URI-encoded as %2E%2E%2F which then in turn is URI-encoded as %252E%252E%252F, that is, double-URI-encode(../) = URI-encode(URI-encode(../)) = URI-encode(%2E%2E%2F) = %252E%252E%252F.^[9]

Double URI-encoding is usually used as an attack technique against web applications and web browsers to bypass authorization schemes and security filters that intercept user input.^[10]^[11] For example because . and its URI-encoded form %2E are used in some directory traversal attacks, they are usually treated as illegal by security filters.^[12] However, it is still possible for %252E, which is the double-URI-encoded form of ., to not to be treated as illegal by security filters and hence pass through them, but later on, when the target system is building the path related to the directory traversal attack it might use the double-URI-decoded form of %252E, which is ., something that the filters would have been treated as illegal.^[13]

Double URI-encoding attacks have been used to bypass authorization schemes and security filters against code injection, directory traversal, XSS and SQL injection.^[14]

Prevention

Decoding some user input twice using the same decoding scheme, once before a security measure and once afterwards, may allow double encoding attacks to bypass that security measure.^[15] Thus, to prevent double encoding attacks, all decoding operations on user input should occur before authorization schemes and security filters that intercept user input.^[16]

Examples

PHP

In PHP programming language, data items in $_GET and $_REQUEST are sufficiently URI-decoded and thus programmers should avoid calling the urldecode function on them.^[17] Calling the urldecode function on data that has been read from $_GET or $_REQUEST causes the data to be URI-decoded once more than it should and hence may open possibility for double URI-encoding attacks.

Directory traversal

In the following PHP program, the value of $_GET["file"] is used to build the path of the file to be sent to the user. This opens the possibility for directory traversal attacks that incorporate their payload into the HTTP GET parameter file. As a security filter against directory traversal attacks, this program searches the value it reads from $_GET["file"] for directory traversal sequences and exits if it finds one. However, after this filter, the program URI-decodes the data that it has read from $_GET["file"], which makes it vulnerable to double URI-encoding attacks.

<?php/* Note that $_GET is already URI-decoded */$path=$_GET["file"];/* Security filter *//* Exit if user input contains directory traversal sequence */if(strstr($path,"../")orstrstr($path,"..\\")){exit("Directory traversal attempt detected.");}/* URI-decode user input once again */$path=urldecode($path);/* Build file path to be sent using user input */echohtmlentities(file_get_contents("uploads/".$path));

This filter prevents payloads such as ../../../../etc/passwd and its URI-encoded form %2E%2E%2F%2E%2E%2F%2E%2E%2F%2E%2E%2Fetc%2Fpasswd. However, %252E%252E%252F%252E%252E%252F%252E%252E%252F%252E%252E%252Fetc%252Fpasswd, which is the double-URI-encoded form of ../../../../etc/passwd, will bypass this filter. When double-URI-encoded payload %252E%252E%252F%252E%252E%252F%252E%252E%252F%252E%252E%252Fetc%252Fpasswd is used, the value of $_GET["file"] will be %2E%2E%2F%2E%2E%2F%2E%2E%2F%2E%2E%2Fetc%2Fpasswd which doesn't contain any directory traversal sequence and thus passes through the filter and will be given to the urldecode function which returns ../../../../etc/passwd, resulting in a successful attack.

XSS

In the following PHP program, the value of $_GET["name"] is used to build a message to be shown to the user. This opens the possibility for XSS attacks that incorporate their payload into the HTTP GET parameter name. As a security filter against XSS attacks, this program sanitizes the value it reads from $_GET["name"] via the htmlentities function. However, after this filter, the program URI-decodes the data that it has read from $_GET["name"], which makes it vulnerable to double URI-encoding attacks.

<?php/* Note that $_GET is already URI-decoded */$name=$_GET["name"];/* Security filter *//* Sanitize user input via htmlentity */$name=htmlentities($name);/* URI-decode user input once again */$name=urldecode($name);/* Build message to be shown using user input */echo"Hello ".$name;

This filter prevents payloads such as <script>alert(1)</script> and its URI-encoded form %3Cscript%3Ealert%281%29%3C%2Fscript%3E. However, %253Cscript%253Ealert%25281%2529%253C%252Fscript%253E, which is the double-URI-encoded form of <script>alert(1)</script>, will bypass this filter. When double-URI-encoded payload %253Cscript%253Ealert%25281%2529%253C%252Fscript%253E is used, the value of $_GET["name"] will be %3Cscript%3Ealert%281%29%3C%2Fscript%3E which doesn't contain any illegal character and thus passes through the htmlentities function without any change and will be given to the urldecode function which returns <script>alert(1)</script>, resulting in a successful attack.

Sources

CAPEC (2022). "CAPEC-120: Double Encoding". capec.mitre.org. 3.7. Retrieved 23 July 2022.
CWE (2022). "CWE-174: Double Decoding of the Same Data". cwe.mitre.org. 4.8. Retrieved 23 July 2022.
Imperva (2022). "Double URL Encoding". docs.imperva.com. Retrieved 23 July 2022.
OWASP (2022). "Double Encoding". owasp.org. Retrieved 23 July 2022.
PHP (2022). "urldecode". php.net. Retrieved 23 July 2022.
PortSwigger (2022). "Obfuscating attacks using encodings". portswigger.net. Obfuscation via double URL encoding. Retrieved 23 July 2022.
Prasad, Prakhar (2016). "Double encoding" . Mastering Modern Web Penetration Testing. Packt Publishing. pp. 11–14. ISBN 978-1785284588.

Related Research Articles

While Hypertext Markup Language (HTML) has been in use since 1991, HTML 4.0 from December 1997 was the first standardized version where international characters were given reasonably complete treatment. When an HTML document includes special characters outside the range of seven-bit ASCII, two goals are worth considering: the information's integrity, and universal browser display.

In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data in sequences of 24 bits that can be represented by four 6-bit Base64 digits.

<span class="mw-page-title-main">Shellcode</span> Small piece of code used as a payload to exploit a software vulnerability

In hacking, a shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is called "shellcode" because it typically starts a command shell from which the attacker can control the compromised machine, but any piece of code that performs a similar task can be called shellcode. Because the function of a payload is not limited to merely spawning a shell, some have suggested that the name shellcode is insufficient. However, attempts at replacing the term have not gained wide acceptance. Shellcode is commonly written in machine code.

<span class="mw-page-title-main">Cross-site scripting</span> Computer security vulnerability

Cross-site scripting (XSS) is a type of security vulnerability that can be found in some web applications. XSS attacks enable attackers to inject client-side scripts into web pages viewed by other users. A cross-site scripting vulnerability may be used by attackers to bypass access controls such as the same-origin policy. Cross-site scripting carried out on websites accounted for roughly 84% of all security vulnerabilities documented by Symantec up until 2007. XSS effects vary in range from petty nuisance to significant security risk, depending on the sensitivity of the data handled by the vulnerable site and the nature of any security mitigation implemented by the site's owner network.

<span class="mw-page-title-main">SQL injection</span> Computer hacking technique

SQL injection is a code injection technique used to attack data-driven applications, in which malicious SQL statements are inserted into an entry field for execution. SQL injection must exploit a security vulnerability in an application's software, for example, when user input is either incorrectly filtered for string literal escape characters embedded in SQL statements or user input is not strongly typed and unexpectedly executed. SQL injection is mostly known as an attack vector for websites but can be used to attack any type of SQL database.

In computer science, canonicalization is a process for converting data that has more than one possible representation into a "standard", "normal", or canonical form. This can be done to compare different representations for equivalence, to count the number of distinct data structures, to improve the efficiency of various algorithms by eliminating repeated calculations, or to make it possible to impose a meaningful sorting order.

<span class="mw-page-title-main">Code injection</span> Computer bug exploit caused by invalid data

Code injection is the exploitation of a computer bug that is caused by processing invalid data. The injection is used by an attacker to introduce code into a vulnerable computer program and change the course of execution. The result of successful code injection can be disastrous, for example, by allowing computer viruses or computer worms to propagate.

Percent-encoding, also known as URL encoding, is a method to encode arbitrary data in a Uniform Resource Identifier (URI) using only the limited US-ASCII characters legal within a URI. Although it is known as URL encoding, it is also used more generally within the main Uniform Resource Identifier (URI) set, which includes both Uniform Resource Locator (URL) and Uniform Resource Name (URN). As such, it is also used in the preparation of data of the application/x-www-form-urlencoded media type, as is often used in the submission of HTML form data in HTTP requests.

A directory traversal attack exploits insufficient security validation or sanitization of user-supplied file names, such that characters representing "traverse to parent directory" are passed through to the operating system's file system API. An affected application can be exploited to gain unauthorized access to the file system.

HTTP response splitting is a form of web application vulnerability, resulting from the failure of the application or its environment to properly sanitize input values. It can be used to perform cross-site scripting attacks, cross-user defacement, web cache poisoning, and similar exploits.

<span class="mw-page-title-main">Application security</span> Measures taken to improve the security of an application

Application security includes all tasks that introduce a secure software development life cycle to development teams. Its final goal is to improve security practices and, through that, to find, fix and preferably prevent security issues within applications. It encompasses the whole application life cycle from requirements analysis, design, implementation, verification as well as maintenance.

Improper input validation or unchecked user input is a type of vulnerability in computer software that may be used for security exploits. This vulnerability is caused when "[t]he product does not validate or incorrectly validates input that can affect the control flow or data flow of a program."

HTTP header injection is a general class of web application security vulnerability which occurs when Hypertext Transfer Protocol (HTTP) headers are dynamically generated based on user input. Header injection in HTTP responses can allow for HTTP response splitting, Session fixation via the Set-Cookie header, cross-site scripting (XSS), and malicious redirect attacks via the location header. HTTP header injection is a relatively new area for web-based attacks, and has primarily been pioneered by Amit Klein in his work on request/response smuggling/splitting.

A file inclusion vulnerability is a type of web vulnerability that is most commonly found to affect web applications that rely on a scripting run time. This issue is caused when an application builds a path to executable code using an attacker-controlled variable in a way that allows the attacker to control which file is executed at run time. A file include vulnerability is distinct from a generic directory traversal attack, in that directory traversal is a way of gaining unauthorized file system access, and a file inclusion vulnerability subverts how an application loads code for execution. Successful exploitation of a file inclusion vulnerability will result in remote code execution on the web server that runs the affected web application. An attacker can use remote code execution to create a web shell on the web server, which can be used for website defacement.

JSONP, or JSON-P, is a historical JavaScript technique for requesting data by loading a <script> element, which is an element intended to load ordinary JavaScript. It was proposed by Bob Ippolito in 2005. JSONP enables sharing of data bypassing same-origin policy, which disallows running JavaScript code to read media DOM elements or XMLHttpRequest data fetched from outside the page's originating site. The originating site is indicated by a combination of URI scheme, host name, and port number.

Cross-site request forgery, also known as one-click attack or session riding and abbreviated as CSRF or XSRF, is a type of malicious exploit of a website where unauthorized commands are submitted from a user that the web application trusts. There are many ways in which a malicious website can transmit such commands; specially-crafted image tags, hidden forms, and JavaScript XMLHttpRequests, for example, can all work without the user's interaction or even knowledge. Unlike cross-site scripting (XSS), which exploits the trust a user has for a particular site, CSRF exploits the trust that a site has in a user's browser. In a CSRF attack, an innocent end user is tricked by an attacker into submitting a web request that they did not intend. This may cause actions to be performed on the website that can include inadvertent client or server data leakage, change of session state, or manipulation of an end user's account.

<span class="mw-page-title-main">Web application firewall</span> HTTP specific network security system

A web application firewall (WAF) is a specific form of application firewall that filters, monitors, and blocks HTTP traffic to and from a web service. By inspecting HTTP traffic, it can prevent attacks exploiting a web application's known vulnerabilities, such as SQL injection, cross-site scripting (XSS), file inclusion, and improper system configuration.

XML External Entity attack is a type of attack against an application that parses XML input. This attack occurs when XML input containing a reference to an external entity is processed by a weakly configured XML parser. This attack may lead to the disclosure of confidential data, denial of service (DoS), server side request forgery, port scanning from the perspective of the machine where the parser is located, and other system impacts.

In computer security, LDAP injection is a code injection technique used to exploit web applications which could reveal sensitive user information or modify information represented in the LDAP data stores. LDAP injection exploits a security vulnerability in an application by manipulating input parameters passed to internal search, add or modify functions. When an application fails to properly sanitize user input, it is possible for an attacker to modify an LDAP statement.

HTTP Parameter Pollution (HPP) is a web application vulnerability exploited by injecting encoded query string delimiters in already existing parameters. The vulnerability occurs if user input is not correctly encoded for output by a web application. This vulnerability allows the injection of parameters into web application-created URLs. It was first brought forth to the public in 2009 by Stefano di Paola and Luca Carettoni, in the conference OWASP EU09 Poland. The impact of such vulnerability varies, and it can range from "simple annoyance" to complete disruption of the intended behavior of a web application. Overriding HTTP parameters to alter a web application's behavior, bypassing input and access validation checkpoints, as well as other indirect vulnerabilities, are possible consequences of a HPP attack.

References

↑ CAPEC 2022 , Description. "The adversary utilizes a repeating of the encoding process for a set of characters (that is, character encoding a character encoding of a character) to obfuscate the payload of a particular request."
↑ CAPEC 2022 , Description,Execution Flow. "This[double encoding] may allow the adversary to bypass filters that attempt to detect illegal characters or strings, such as those that might be used in traversal or injection attacks [...] For instance, by double encoding certain characters in the URL (e.g. dots and slashes) an adversary may try to get access to restricted resources on the web server or force browse to protected pages (thus subverting the authorization service). An adversary can also attempt other injection style attacks using this attack pattern: command injection, SQL injection, etc."
↑ CAPEC 2022 , Description,Execution Flow. "This[double encoding] may allow the adversary to bypass filters that attempt to detect illegal characters or strings, such as those that might be used in traversal or injection attacks. [...] Try double-encoding for parts of the input in order to try to get past the filters."
↑ OWASP 2022 , Description. "By using double encoding it’s possible to bypass security filters that only decode user input once."
↑ OWASP 2022 , Description. "By using double encoding it’s possible to bypass security filters that only decode user input once. The second decoding process is executed by the backend platform or modules that properly handle encoded data, but don’t have the corresponding security checks in place."
↑ Prasad 2016 , p. 11. "Double percent encoding is the same as percent encoding with a twist that each character is encoded twice instead of once."
↑ Prasad 2016 , p. 11. "Double percent encoding is the same as percent encoding with a twist that each character is encoded twice instead of once."
↑ Prasad 2016 , p. 11. "So if I had to encode < using double encoding, I'll first encode it into its percent-encoded format, which is %3c and then again percent encode the % character. The result of this will be %253c."
↑ OWASP 2022 , Description. "For example, ../ (dot-dot-slash) characters represent %2E%2E%2F in hexadecimal representation. When the % symbol is encoded again, its representation in hexadecimal code is %25. The result from the double encoding process ../ (dot-dot-slash) would be %252E%252E%252F"
↑ Prasad 2016 , p. 11. "This technique[double percent encoding] comes in pretty handy when attempting to evade filters which attempt to blacklist certain encoded characters"
↑ CAPEC 2022 , Execution Flow. "For instance, by double encoding certain characters in the URL (e.g. dots and slashes) an adversary may try to get access to restricted resources on the web server or force browse to protected pages (thus subverting the authorization service). An adversary can also attempt other injection style attacks using this attack pattern: command injection, SQL injection, etc."
↑ CAPEC 2022 , Description. "For example, a dot (.), often used in path traversal attacks and therefore often blocked by filters, could be URL encoded as %2E. However, many filters recognize this encoding and would still block the request."
↑ CAPEC 2022 , Description. "In a double encoding, the % in the above URL encoding would be encoded again as %25, resulting in %252E which some filters might not catch, but which could still be interpreted as a dot (.) by interpreters on the target."
↑ CWE 2022 , Observed Examples.
↑ CWE 2022 , Description. "The software decodes the same input twice, which can limit the effectiveness of any protection mechanism that occurs in between the decoding operations."
↑ CWE 2022 , Potential Mitigations. "Inputs should be decoded and canonicalized to the application's current internal representation before being validated (CWE-180)."
↑ PHP 2022 , Notes. "Warning: The superglobals $_GET and $_REQUEST are already decoded. Using urldecode() on an element in $_GET or $_REQUEST could have unexpected and dangerous results."

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] CAPEC 2022 , Description. "The adversary utilizes a repeating of the encoding process for a set of characters (that is, character encoding a character encoding of a character) to obfuscate the payload of a particular request."

[2] CAPEC 2022 , Description,Execution Flow. "This[double encoding] may allow the adversary to bypass filters that attempt to detect illegal characters or strings, such as those that might be used in traversal or injection attacks [...] For instance, by double encoding certain characters in the URL (e.g. dots and slashes) an adversary may try to get access to restricted resources on the web server or force browse to protected pages (thus subverting the authorization service). An adversary can also attempt other injection style attacks using this attack pattern: command injection, SQL injection, etc."

[3] CAPEC 2022 , Description,Execution Flow. "This[double encoding] may allow the adversary to bypass filters that attempt to detect illegal characters or strings, such as those that might be used in traversal or injection attacks. [...] Try double-encoding for parts of the input in order to try to get past the filters."

[4] OWASP 2022 , Description. "By using double encoding it’s possible to bypass security filters that only decode user input once."

[5] OWASP 2022 , Description. "By using double encoding it’s possible to bypass security filters that only decode user input once. The second decoding process is executed by the backend platform or modules that properly handle encoded data, but don’t have the corresponding security checks in place."

[6] Prasad 2016 , p. 11. "Double percent encoding is the same as percent encoding with a twist that each character is encoded twice instead of once."

[7] Prasad 2016 , p. 11. "Double percent encoding is the same as percent encoding with a twist that each character is encoded twice instead of once."

[8] Prasad 2016 , p. 11. "So if I had to encode < using double encoding, I'll first encode it into its percent-encoded format, which is %3c and then again percent encode the % character. The result of this will be %253c."

[9] OWASP 2022 , Description. "For example, ../ (dot-dot-slash) characters represent %2E%2E%2F in hexadecimal representation. When the % symbol is encoded again, its representation in hexadecimal code is %25. The result from the double encoding process ../ (dot-dot-slash) would be %252E%252E%252F"

[10] Prasad 2016 , p. 11. "This technique[double percent encoding] comes in pretty handy when attempting to evade filters which attempt to blacklist certain encoded characters"

[11] CAPEC 2022 , Execution Flow. "For instance, by double encoding certain characters in the URL (e.g. dots and slashes) an adversary may try to get access to restricted resources on the web server or force browse to protected pages (thus subverting the authorization service). An adversary can also attempt other injection style attacks using this attack pattern: command injection, SQL injection, etc."

[12] CAPEC 2022 , Description. "For example, a dot (.), often used in path traversal attacks and therefore often blocked by filters, could be URL encoded as %2E. However, many filters recognize this encoding and would still block the request."

[13] CAPEC 2022 , Description. "In a double encoding, the % in the above URL encoding would be encoded again as %25, resulting in %252E which some filters might not catch, but which could still be interpreted as a dot (.) by interpreters on the target."

[14] CWE 2022 , Observed Examples.

[15] CWE 2022 , Description. "The software decodes the same input twice, which can limit the effectiveness of any protection mechanism that occurs in between the decoding operations."

[16] CWE 2022 , Potential Mitigations. "Inputs should be decoded and canonicalized to the application's current internal representation before being validated (CWE-180)."

[17] PHP 2022 , Notes. "Warning: The superglobals $_GET and $_REQUEST are already decoded. Using urldecode() on an element in $_GET or $_REQUEST could have unexpected and dangerous results."

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]