Directory traversal attack

Last updated

Adirectory traversal (or path traversal) attack exploits insufficient security validation or sanitization of user-supplied file names, such that characters representing "traverse to parent directory" are passed through to the operating system's file system API. An affected application can be exploited to gain unauthorized access to the file system.

Contents

Examples

In PHP

A typical example of a vulnerable application in PHP code is:

<?php$template='red.php';if(isset($_COOKIE['TEMPLATE'])){$template=$_COOKIE['TEMPLATE'];}include"/home/users/phpguru/templates/".$template;

An attack against this system could be to send the following HTTP request:

GET/vulnerable.phpHTTP/1.0Cookie:TEMPLATE=../../../../../../../../../etc/passwd

The server would then generate a response such as:

HTTP/1.0200OKContent-Type:text/htmlServer:Apache  root:fi3sED95ibqR6:0:1:System Operator:/:/bin/ksh  daemon:*:1:1::/tmp:  phpguru:f8fk3j1OIf31.:182:100:Developer:/home/users/phpguru/:/bin/csh 

The repeated ../ characters after /home/users/phpguru/templates/ have caused include() to traverse to the root directory, and then include the Unix password file /etc/passwd .

Unix /etc/passwd is a common file used to demonstrate directory traversal, as it is often used by crackers to try cracking the passwords. However, in more recent Unix systems, the /etc/passwd file does not contain the hashed passwords, and they are instead located in the /etc/shadow file, which cannot be read by unprivileged users on the machine. Even in that case, though, reading /etc/passwd does still show a list of user accounts.

Zip Slip vulnerability

Another example is the Zip Slip vulnerability that affects several archive file formats like ZIP. [1]

Variations

Directory traversal in its simplest form uses the ../ pattern. Some common variations are listed below:

Microsoft Windows

Microsoft Windows and DOS directory traversal uses the ..\ or ../ patterns. [2]

Each partition has a separate root directory (labeled C:\ where C could be any partition), and there is no common root directory above that. This means that for most directory vulnerabilities on Windows, attacks are limited to a single partition.

Directory traversal has been the cause of numerous Microsoft vulnerabilities. [3] [4]

Percent encoding in URIs

Some web applications attempt to prevent directory traversal by scanning the path of a request URI for patterns such as ../. This check is sometimes mistakenly performed before percent-decoding, causing URIs containing patterns like %2e%2e/ to be accepted despite being decoded into ../ before actual use. [5]

Double encoding

Percent decoding may accidentally be performed multiple times; once before validation, but again afterwards, making the application vulnerable to Double percent-encoding attacks [6] in which illegal characters are replaced by their double-percent-encoded form in order to bypass security countermeasures. [7] For example, in a double percent-encoding attack, ../ may be replaced by its double-percent-encoded form %252E%252E%252F. [8] This kind of vulnerability notably affected versions 5.0 and earlier of Microsoft's IIS web server software. [9]

UTF-8

A badly implemented UTF-8 decoder may accept characters encoded using more bytes than necessary, leading to alternative character representations, such as %2e and %c0%ae both representing .. This is specifically forbidden by the UTF-8 standard, [10] but has still led to directory traversal vulnerabilities in software such as the IIS web server. [11]

Archives

Some archive formats like zip allow for directory traversal attacks: files in the archive can be written such that they overwrite files on the filesystem by backtracking. Code that extracts archive files can be written to check that the paths of the files in the archive do not engage in path traversal.

Prevention

A possible algorithm for preventing directory traversal would be to:

  1. Process URI requests that do not result in a file request, e.g., executing a hook into user code, before continuing below.
  2. When a URI request for a file/directory is to be made, build a full path to the file/directory if it exists, and normalize all characters (e.g., %20 converted to spaces).
  3. It is assumed that a 'Document Root' fully qualified, normalized, path is known, and this string has a length N. Assume that no files outside this directory can be served.
  4. Ensure that the first N characters of the fully qualified path to the requested file is exactly the same as the 'Document Root'.
  5. If so, allow the file to be returned.
  6. If not, return an error, since the request is clearly out of bounds from what the web-server should be allowed to serve.

Using a hard-coded predefined file extension to suffix the path does not necessarily limit the scope of the attack to files of that file extension.

<?phpinclude($_GET['file'].'.html');

The user can use the NULL character (indicating the end of the string) in order to bypass everything after the $_GET. (This is PHP-specific.)

See also

Related Research Articles

<span class="mw-page-title-main">Cygwin</span> Unix subsystem for Windows machines

Cygwin is a Unix-like environment and command-line interface for Microsoft Windows.

<span class="mw-page-title-main">UTF-16</span> Variable-width encoding of Unicode, using one or two 16-bit code units

UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as code points are encoded with one or two 16-bit code units. UTF-16 arose from an earlier obsolete fixed-width 16-bit encoding now known as UCS-2 (for 2-byte Universal Character Set), once it became clear that more than 216 (65,536) code points were needed, including most emoji and important CJK characters such as for personal and place names.

In computer programming, Base64 is a group of binary-to-text encoding schemes that transforms binary data into a sequence of printable characters, limited to a set of 64 unique characters. More specifically, the source binary data is taken 6 bits at a time, then this group of 6 bits is mapped to one of 64 unique characters.

Unix security refers to the means of securing a Unix or Unix-like operating system. A secure environment is achieved not only by the design concepts of these operating systems, but also through vigilant user and administrative practices.

<span class="mw-page-title-main">Filename</span> Text string used to uniquely identify a computer file

A filename or file name is a name used to uniquely identify a computer file in a file system. Different file systems impose different restrictions on filename lengths.

A path is a string of characters used to uniquely identify a location in a directory structure. It is composed by following the directory tree hierarchy in which components, separated by a delimiting character, represent each directory. The delimiting character is most commonly the slash ("/"), the backslash character ("\"), or colon (":"), though some operating systems may use a different delimiter. Paths are used extensively in computer science to represent the directory/file relationships common in modern operating systems and are essential in the construction of Uniform Resource Locators (URLs). Resources can be represented by either absolute or relative paths.

passwd Tool to change passwords on Unix-like OSes

passwd is a command on Unix, Plan 9, Inferno, and most Unix-like operating systems used to change a user's password. The password entered by the user is run through a key derivation function to create a hashed version of the new password, which is saved. Only the hashed version is stored; the entered password is not saved for security reasons.

A query string is a part of a uniform resource locator (URL) that assigns values to specified parameters. A query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML document, choosing the appearance of a page, or jumping to positions in multimedia content.

In computer science, canonicalization is a process for converting data that has more than one possible representation into a "standard", "normal", or canonical form. This can be done to compare different representations for equivalence, to count the number of distinct data structures, to improve the efficiency of various algorithms by eliminating repeated calculations, or to make it possible to impose a meaningful sorting order.

The Unix and Linux access rights flags setuid and setgid allow users to run an executable with the file system permissions of the executable's owner or group respectively and to change behaviour in directories. They are often used to allow users on a computer system to run programs with temporarily elevated privileges to perform a specific task. While the assumed user id or group id privileges provided are not always elevated, at a minimum they are specific.

M3U is a computer file format for a multimedia playlist. One common use of the M3U file format is creating a single-entry playlist file pointing to a stream on the Internet. The created file provides easy access to that stream and is often used in downloads from a website, for emailing, and for listening to Internet radio.

The data URI scheme is a uniform resource identifier (URI) scheme that provides a way to include data in-line in Web pages as if they were external resources. It is a form of file literal or here document. This technique allows normally separate elements such as images and style sheets to be fetched in a single Hypertext Transfer Protocol (HTTP) request, which may be more efficient than multiple HTTP requests, and used by several browser extensions to package images as well as other multimedia content in a single HTML file for page saving. As of 2022, data URIs are fully supported by most major browsers, and partially supported in Internet Explorer.

URL encoding, officially known as percent-encoding, is a method to encode arbitrary data in a uniform resource identifier (URI) using only the US-ASCII characters legal within a URI. Although it is known as URL encoding, it is also used more generally within the main Uniform Resource Identifier (URI) set, which includes both Uniform Resource Locator (URL) and Uniform Resource Name (URN). As such, it is also used in the preparation of data of the application/x-www-form-urlencoded media type, as is often used in the submission of HTML form data in HTTP requests.

The Computer Oracle and Password System (COPS) was the first vulnerability scanner for Unix operating systems to achieve widespread use. It was created by Dan Farmer while he was a student at Purdue University. Gene Spafford helped Farmer start the project in 1989.

<span class="mw-page-title-main">URI normalization</span> Process by which URIs are standardized

URI normalization is the process by which URIs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URI into a normalized URI so it is possible to determine if two syntactically different URIs may be equivalent.

A file inclusion vulnerability is a type of web vulnerability that is most commonly found to affect web applications that rely on a scripting run time. This issue is caused when an application builds a path to executable code using an attacker-controlled variable in a way that allows the attacker to control which file is executed at run time. A file include vulnerability is distinct from a generic directory traversal attack, in that directory traversal is a way of gaining unauthorized file system access, and a file inclusion vulnerability subverts how an application loads code for execution. Successful exploitation of a file inclusion vulnerability will result in remote code execution on the web server that runs the affected web application. An attacker can use remote code execution to create a web shell on the web server, which can be used for website defacement.

In programming, a file uniform resource identifier (URI) scheme is a specific format of URI, used to specifically identify a file on a host computer. While URIs can be used to identify anything, there is specific syntax associated with identifying files.

XML External Entity attack, or simply XXE attack, is a type of attack against an application that parses XML input. This attack occurs when XML input containing a reference to an external entity is processed by a weakly configured XML parser. This attack may lead to the disclosure of confidential data, DoS attacks, server-side request forgery, port scanning from the perspective of the machine where the parser is located, and other system impacts.

Double encoding is the act of encoding data twice in a row using the same encoding scheme. It is usually used as an attack technique to bypass authorization schemes or security filters that intercept user input. In double encoding attacks against security filters, characters of the payload that are treated as illegal by those filters are replaced with their double-encoded form.

A Uniform Resource Locator (URL), colloquially known as an address on the Web, is a reference to a resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifier (URI), although many people use the two terms interchangeably. URLs occur most commonly to reference web pages (HTTP/HTTPS) but are also used for file transfer (FTP), email (mailto), database access (JDBC), and many other applications.

References

  1. "Zip Slip Vulnerability". Snyk. The vulnerability is exploited using a specially crafted archive that holds directory traversal filenames (e.g. ../../evil.sh). The Zip Slip vulnerability can affect numerous archive formats, including tar, jar, war, cpio, apk, rar and 7z.
  2. "Naming Files, Paths, and Namespaces". Microsoft. File I/O functions in the Windows API convert '/' to '\' as part of converting the name to an NT-style name
  3. Burnett, Mark (December 20, 2004). "Security Holes That Run Deep". SecurityFocus. Archived from the original on February 2, 2021. Retrieved March 22, 2016.
  4. "Microsoft: Security Vulnerabilities (Directory Traversal)". CVE Details.
  5. "Path Traversal". OWASP.
  6. "CWE-174: Double Decoding of the Same Data". cwe.mitre.org. Retrieved 24 July 2022. The software decodes the same input twice, which can limit the effectiveness of any protection mechanism that occurs in between the decoding operations.
  7. "CAPEC-120: Double Encoding". capec.mitre.org. Retrieved 23 July 2022. This[double encoding] may allow the adversary to bypass filters that attempt to detect illegal characters or strings, such as those that might be used in traversal or injection attacks. [...] Try double-encoding for parts of the input in order to try to get past the filters.
  8. "Double Encoding". owasp.org. Retrieved 23 July 2022. For example, ../ (dot-dot-slash) characters represent %2E%2E%2F in hexadecimal representation. When the % symbol is encoded again, its representation in hexadecimal code is %25. The result from the double encoding process ../ (dot-dot-slash) would be %252E%252E%252F
  9. "CVE-2001-0333". Common Vulnerabilities and Exposures.
  10. Yergeau, F. (2003). "RFC 2279 - UTF-8, a transformation format of ISO 10646". IETF. doi: 10.17487/RFC3629 .{{cite journal}}: Cite journal requires |journal= (help)
  11. "CVE-2002-1744". Common Vulnerabilities and Exposures.

Resources