Shellcode

Last updated

In hacking, a shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is called "shellcode" because it typically starts a command shell from which the attacker can control the compromised machine, but any piece of code that performs a similar task can be called shellcode. Because the function of a payload is not limited to merely spawning a shell, some have suggested that the name shellcode is insufficient. [1] However, attempts at replacing the term have not gained wide acceptance. Shellcode is commonly written in machine code.

Contents

When creating shellcode, it is generally desirable to make it both small and executable, which allows it to be used in as wide a variety of situations as possible. [2] In assembly code, the same function can be performed in a multitude of ways and there is some variety in the lengths of opcodes that can be used for this purpose; good shellcode writers can put these small opcodes to use to create more compact shellcode. [3] Some have reached the smallest possible size while maintaining stability. [4]

Types of shellcode

Shellcode can either be local or remote, depending on whether it gives an attacker control over the machine it runs on (local) or over another machine through a network (remote).

Local

Local shellcode is used by an attacker who has limited access to a machine but can exploit a vulnerability, for example a buffer overflow, in a higher-privileged process on that machine. If successfully executed, the shellcode will provide the attacker access to the machine with the same higher privileges as the targeted process.

Remote

Remote shellcode is used when an attacker wants to target a vulnerable process running on another machine on a local network, intranet, or a remote network. If successfully executed, the shellcode can provide the attacker access to the target machine across the network. Remote shellcodes normally use standard TCP/IP socket connections to allow the attacker access to the shell on the target machine. Such shellcode can be categorized based on how this connection is set up: if the shellcode establishes the connection it is called a "reverse shell", or a connect-back shellcode because the shellcode connects back to the attacker's machine. On the other hand, if the attacker establishes the connection, the shellcode is called a bindshell because the shellcode binds to a certain port on the victim's machine. There's a peculiar shellcode named bindshell random port that skips the binding part and listens on a random port made available by the operating system. Because of that, the bindshell random port became the smallest stable bindshell shellcode for x86_64 available to this date. A third, much less common type, is socket-reuse shellcode. This type of shellcode is sometimes used when an exploit establishes a connection to the vulnerable process that is not closed before the shellcode is run. The shellcode can then re-use this connection to communicate with the attacker. Socket re-using shellcode is more elaborate, since the shellcode needs to find out which connection to re-use and the machine may have many connections open. [5]

A firewall can be used to detect outgoing connections made by connect-back shellcode as well as incoming connections made by bindshells. They can, therefore, offer some protection against an attacker, even if the system is vulnerable, by preventing the attacker from connecting to the shell created by the shellcode. One reason why socket re-using shellcode is sometimes used is that it does not create new connections and, therefore, is harder to detect and block.

Download and execute

Download and execute is a type of remote shellcode that downloads and executes some form of malware on the target system. This type of shellcode does not spawn a shell, but rather instructs the machine to download a certain executable file off the network, save it to disk and execute it. Nowadays, it is commonly used in drive-by download attacks, where a victim visits a malicious webpage that in turn attempts to run such a download and execute shellcode in order to install software on the victim's machine. A variation of this type of shellcode downloads and loads a library. [6] [7] Advantages of this technique are that the code can be smaller, that it does not require the shellcode to spawn a new process on the target system, and that the shellcode does not need code to clean up the targeted process as this can be done by the library loaded into the process.

Staged

When the amount of data that an attacker can inject into the target process is too limited to execute useful shellcode directly, it may be possible to execute it in stages. First, a small piece of shellcode (stage 1) is executed. This code then downloads a larger piece of shellcode (stage 2) into the process's memory and executes it.

Egg-hunt

This is another form of staged shellcode, which is used if an attacker can inject a larger shellcode into the process but cannot determine where in the process it will end up. Small egg-hunt shellcode is injected into the process at a predictable location and executed. This code then searches the process's address space for the larger shellcode (the egg) and executes it. [8]

Omelette

This type of shellcode is similar to egg-hunt shellcode, but looks for multiple small blocks of data (eggs) and recombines them into one larger block (the omelette) that is subsequently executed. This is used when an attacker can only inject a number of small blocks of data into the process. [9]

Shellcode execution strategy

An exploit will commonly inject a shellcode into the target process before or at the same time as it exploits a vulnerability to gain control over the program counter. The program counter is adjusted to point to the shellcode, after which it gets executed and performs its task. Injecting the shellcode is often done by storing the shellcode in data sent over the network to the vulnerable process, by supplying it in a file that is read by the vulnerable process or through the command line or environment in the case of local exploits.

Shellcode encoding

Because most processes filter or restrict the data that can be injected, shellcode often needs to be written to allow for these restrictions. This includes making the code small, null-free or alphanumeric. Various solutions have been found to get around such restrictions, including:

Since intrusion detection can detect signatures of simple shellcodes being sent over the network, it is often encoded, made self-decrypting or polymorphic to avoid detection.

Percent encoding

Exploits that target browsers commonly encode shellcode in a JavaScript string using percent-encoding, escape sequence encoding "\uXXXX" or entity encoding. [10] Some exploits also obfuscate the encoded shellcode string further to prevent detection by IDS.

For example, on the IA-32 architecture, here's how two NOP (no-operation) instructions would look, first unencoded:

90             NOP 90             NOP
Encoded double-NOPs:
percent-encodingunescape("%u9090")
unicode literal"\u9090"
HTML/XML entity"邐" or "邐"

This instruction is used in NOP slides.

Null-free shellcode

Most shellcodes are written without the use of null bytes because they are intended to be injected into a target process through null-terminated strings. When a null-terminated string is copied, it will be copied up to and including the first null but subsequent bytes of the shellcode will not be processed. When shellcode that contains nulls is injected in this way, only part of the shellcode would be injected, making it incapable of running successfully.

To produce null-free shellcode from shellcode that contains null bytes, one can substitute machine instructions that contain zeroes with instructions that have the same effect but are free of nulls. For example, on the IA-32 architecture one could replace this instruction:

B8 01000000    MOV EAX,1          // Set the register EAX to 0x00000001

which contains zeroes as part of the literal (1 expands to 0x00000001) with these instructions:

33C0           XOR EAX,EAX        // Set the register EAX to 0x00000000 40             INC EAX            // Increase EAX to 0x00000001

which have the same effect but take fewer bytes to encode and are free of nulls.

Alphanumeric and printable shellcode

An alphanumeric shellcode is a shellcode that consists of or assembles itself on execution into entirely alphanumeric ASCII or Unicode characters such as 0–9, A–Z and a–z. [11] [12] This type of encoding was created by hackers to hide working machine code inside what appears to be text. This can be useful to avoid detection of the code and to allow the code to pass through filters that scrub non-alphanumeric characters from strings (in part, such filters were a response to non-alphanumeric shellcode exploits). A similar type of encoding is called printable code and uses all printable characters (0–9, A–Z, a–z, !@#%^&*() etc.). An similarly restricted variant is ECHOable code not containing any characters which are not accepted by the ECHO command. It has been shown that it is possible to create shellcode that looks like normal text in English. [13] Writing alphanumeric or printable code requires good understanding of the instruction set architecture of the machine(s) on which the code is to be executed. It has been demonstrated that it is possible to write alphanumeric code that is executable on more than one machine, [14] thereby constituting multi-architecture executable code.

In certain circumstances, a target process will filter any byte from the injected shellcode that is not a printable or alphanumeric character. Under such circumstances, the range of instructions that can be used to write a shellcode becomes very limited. A solution to this problem was published by Rix in Phrack 57 [11] in which he showed it was possible to turn any code into alphanumeric code. A technique often used is to create self-modifying code, because this allows the code to modify its own bytes to include bytes outside of the normally allowed range, thereby expanding the range of instructions it can use. Using this trick, a self-modifying decoder can be created that initially uses only bytes in the allowed range. The main code of the shellcode is encoded, also only using bytes in the allowed range. When the output shellcode is run, the decoder can modify its own code to be able to use any instruction it requires to function properly and then continues to decode the original shellcode. After decoding the shellcode the decoder transfers control to it, so it can be executed as normal. It has been shown that it is possible to create arbitrarily complex shellcode that looks like normal text in English. [13]

Unicode proof shellcode

Modern programs use Unicode strings to allow internationalization of text. Often, these programs will convert incoming ASCII strings to Unicode before processing them. Unicode strings encoded in UTF-16 use two bytes to encode each character (or four bytes for some special characters). When an ASCII (Latin-1 in general) string is transformed into UTF-16, a zero byte is inserted after each byte in the original string. Obscou proved in Phrack 61 [12] that it is possible to write shellcode that can run successfully after this transformation. Programs that can automatically encode any shellcode into alphanumeric UTF-16-proof shellcode exist, based on the same principle of a small self-modifying decoder that decodes the original shellcode.

Platforms

Most shellcode is written in machine code because of the low level at which the vulnerability being exploited gives an attacker access to the process. Shellcode is therefore often created to target one specific combination of processor, operating system and service pack, called a platform. For some exploits, due to the constraints put on the shellcode by the target process, a very specific shellcode must be created. However, it is not impossible for one shellcode to work for multiple exploits, service packs, operating systems and even processors. [15] [16] [17] Such versatility is commonly achieved by creating multiple versions of the shellcode that target the various platforms and creating a header that branches to the correct version for the platform the code is running on. When executed, the code behaves differently for different platforms and executes the right part of the shellcode for the platform it is running on.

Shellcode analysis

Shellcode cannot be executed directly. In order to analyze what a shellcode attempts to do it must be loaded into another process. One common analysis technique is to write a small C program which holds the shellcode as a byte buffer, and then use a function pointer or use inline assembler to transfer execution to it. Another technique is to use an online tool, such as shellcode_2_exe, to embed the shellcode into a pre-made executable husk which can then be analyzed in a standard debugger. Specialized shellcode analysis tools also exist, such as the iDefense sclog project which was originally released in 2005 as part of the Malcode Analyst Pack. Sclog is designed to load external shellcode files and execute them within an API logging framework. Emulation based shellcode analysis tools also exist such as the sctest application which is part of the cross platform libemu package. Another emulation based shellcode analysis tool, built around the libemu library, is scdbg which includes a basic debug shell and integrated reporting features.

See also

Related Research Articles

<span class="mw-page-title-main">Buffer overflow</span> Anomaly in computer security and programming

In programming and information security, a buffer overflow or buffer overrun is an anomaly whereby a program writes data to a buffer beyond the buffer's allocated memory, overwriting adjacent memory locations.

UTF-8 is a variable-length character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation Format – 8-bit.

x86 assembly language is the name for the family of assembly languages which provide some level of backward compatibility with CPUs back to the Intel 8008 microprocessor, which was launched in April 1972. It is used to produce object code for the x86 class of processors.

The null character is a control character with the value zero. It is present in many character sets, including those defined by the Baudot and ITA2 codes, ISO/IEC 646, the C0 control code, the Universal Coded Character Set, and EBCDIC. It is available in nearly all mainstream programming languages. It is often abbreviated as NUL. In 8-bit codes, it is known as a null byte.

7z is a compressed archive file format that supports several different data compression, encryption and pre-processing algorithms. The 7z format initially appeared as implemented by the 7-Zip archiver. The 7-Zip program is publicly available under the terms of the GNU Lesser General Public License. The LZMA SDK 4.62 was placed in the public domain in December 2008. The latest stable version of 7-Zip and LZMA SDK is version 23.01.

Uncontrolled format string is a type of code injection vulnerability discovered around 1989 that can be used in security exploits. Originally thought harmless, format string exploits can be used to crash a program or to execute harmful code. The problem stems from the use of unchecked user input as the format string parameter in certain C functions that perform formatting, such as printf . A malicious user may use the %s and %x format tokens, among others, to print data from the call stack or possibly other locations in memory. One may also write arbitrary data to arbitrary locations using the %n format token, which commands printf and similar functions to write the number of bytes formatted to an address stored on the stack.

The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.

A "return-to-libc" attack is a computer security attack usually starting with a buffer overflow in which a subroutine return address on a call stack is replaced by an address of a subroutine that is already present in the process executable memory, bypassing the no-execute bit feature and ridding the attacker of the need to inject their own code. The first example of this attack in the wild was contributed by Alexander Peslyak on the Bugtraq mailing list in 1997.

The Pentium F00F bug is a design flaw in the majority of Intel Pentium, Pentium MMX, and Pentium OverDrive processors. Discovered in 1997, it can result in the processor ceasing to function until the computer is physically rebooted. The bug has been circumvented through operating system updates.

In the x86 architecture, the CPUID instruction is a processor supplementary instruction allowing software to discover details of the processor. It was introduced by Intel in 1993 with the launch of the Pentium and SL-enhanced 486 processors.

A file inclusion vulnerability is a type of web vulnerability that is most commonly found to affect web applications that rely on a scripting run time. This issue is caused when an application builds a path to executable code using an attacker-controlled variable in a way that allows the attacker to control which file is executed at run time. A file include vulnerability is distinct from a generic directory traversal attack, in that directory traversal is a way of gaining unauthorized file system access, and a file inclusion vulnerability subverts how an application loads code for execution. Successful exploitation of a file inclusion vulnerability will result in remote code execution on the web server that runs the affected web application. An attacker can use remote code execution to create a web shell on the web server, which can be used for website defacement.

Intrusion detection system evasion techniques are modifications made to attacks in order to prevent detection by an intrusion detection system (IDS). Almost all published evasion techniques modify network attacks. The 1998 paper Insertion, Evasion, and Denial of Service: Eluding Network Intrusion Detection popularized IDS evasion, and discussed both evasion techniques and areas where the correct interpretation was ambiguous depending on the targeted computer system. The 'fragroute' and 'fragrouter' programs implement evasion techniques discussed in the paper. Many web vulnerability scanners, such as 'Nikto', 'whisker' and 'Sandcat', also incorporate IDS evasion techniques.

In software, a stack buffer overflow or stack buffer overrun occurs when a program writes to a memory address on the program's call stack outside of the intended data structure, which is usually a fixed-length buffer. Stack buffer overflow bugs are caused when a program writes more data to a buffer located on the stack than what is actually allocated for that buffer. This almost always results in corruption of adjacent data on the stack, and in cases where the overflow was triggered by mistake, will often cause the program to crash or operate incorrectly. Stack buffer overflow is a type of the more general programming malfunction known as buffer overflow. Overfilling a buffer on the stack is more likely to derail program execution than overfilling a buffer on the heap because the stack contains the return addresses for all active function calls.

In computer security, heap spraying is a technique used in exploits to facilitate arbitrary code execution. The part of the source code of an exploit that implements this technique is called a heap spray. In general, code that sprays the heap attempts to put a certain sequence of bytes at a predetermined location in the memory of a target process by having it allocate (large) blocks on the process's heap and fill the bytes in these blocks with the right values.

Return-oriented programming (ROP) is a computer security exploit technique that allows an attacker to execute code in the presence of security defenses such as executable space protection and code signing.

JIT spraying is a class of computer security exploit that circumvents the protection of address space layout randomization and data execution prevention by exploiting the behavior of just-in-time compilation. It has been used to exploit the PDF format and Adobe Flash.

Blind return oriented programming (BROP) is an exploit technique which can successfully create an exploit even if the attacker does not possess the target binary. BROP attacks shown by Bittau et al. have defeated address space layout randomization (ASLR) and stack canaries on 64-bit systems.

Sigreturn-oriented programming (SROP) is a computer security exploit technique that allows an attacker to execute code in presence of security measures such as non-executable memory and code signing. It was presented for the first time at the 35th IEEE Symposium on Security and Privacy in 2014 where it won the best student paper award. This technique employs the same basic assumptions behind the return-oriented programming (ROP) technique: an attacker controlling the call stack, for example through a stack buffer overflow, is able to influence the control flow of the program through simple instruction sequences called gadgets. The attack works by pushing a forged sigcontext structure on the call stack, overwriting the original return address with the location of a gadget that allows the attacker to call the sigreturn system call. Often just a single gadget is needed to successfully put this attack into effect. This gadget may reside at a fixed location, making this attack simple and effective, with a setup generally simpler and more portable than the one needed by the plain return-oriented programming technique.

Intel microcode is microcode that runs inside x86 processors made by Intel. Since the P6 microarchitecture introduced in the mid-1990s, the microcode programs can be patched by the operating system or BIOS firmware to work around bugs found in the CPU after release. Intel had originally designed microcode updates for processor debugging under its design for testing (DFT) initiative.

The Zealot Campaign is a cryptocurrency mining malware collected from a series of stolen National Security Agency (NSA) exploits, released by the Shadow Brokers group on both Windows and Linux machines to mine cryptocurrency, specifically Monero. Discovered in December 2017, these exploits appeared in the Zealot suite include EternalBlue, EternalSynergy, and Apache Struts Jakarta Multipart Parser attack exploit, or CVE-2017-5638. The other notable exploit within the Zealot vulnerabilities includes vulnerability CVE-2017-9822, known as DotNetNuke (DNN) which exploits a content management system so that the user can install a Monero miner software. An estimated USD $8,500 of Monero having been mined on a single targeted computer. The campaign was discovered and studied extensively by F5 Networks in December 2017.

References

  1. Foster, James C.; Price, Mike (2005-04-12). Sockets, Shellcode, Porting, & Coding: Reverse Engineering Exploits and Tool Coding for Security Professionals. Elsevier Science & Technology Books. ISBN   1-59749-005-9.
  2. Anley, Chris; Koziol, Jack (2007). The shellcoder's handbook: discovering and exploiting security holes (2 ed.). Indianapolis, Indiana, UA: Wiley. ISBN   978-0-470-19882-7. OCLC   173682537.
  3. Foster, James C. (2005). Buffer overflow attacks: detect, exploit, prevent. Rockland, MA, USA: Syngress. ISBN   1-59749-022-9. OCLC   57566682.
  4. "Tiny Execve sh - Assembly Language - Linux/x86". GitHub. Retrieved 2021-02-01.
  5. BHA (2013-06-06). "Shellcode/Socket-reuse" . Retrieved 2013-06-07.
  6. SkyLined (2010-01-11). "Download and LoadLibrary shellcode released". Archived from the original on 2010-01-23. Retrieved 2010-01-19.
  7. "Download and LoadLibrary shellcode for x86 Windows". 2010-01-11. Retrieved 2010-01-19.
  8. Skape (2004-03-09). "Safely Searching Process Virtual Address Space" (PDF). nologin. Retrieved 2009-03-19.
  9. SkyLined (2009-03-16). "w32 SEH omelet shellcode". Skypher.com. Archived from the original on 2009-03-23. Retrieved 2009-03-19.
  10. "JavaScript large number of unescape patterns detected". Archived from the original on 2015-04-03.
  11. 1 2 rix (2001-08-11). "Writing ia32 alphanumeric shellcodes". Phrack. Phrack Inc. 0x0b (57). #0x0f of 0x12. Archived from the original on 2022-03-08. Retrieved 2022-05-26.
  12. 1 2 obscou (2003-08-13). "Building IA32 'Unicode-Proof' Shellcodes". Phrack. Phrack Inc. 11 (61). #0x0b of 0x0f. Archived from the original on 2022-05-26. Retrieved 2008-02-29.
  13. 1 2 Mason, Joshua; Small, Sam; Monrose, Fabian; MacManus, Greg (November 2009). English Shellcode (PDF). Proceedings of the 16th ACM conference on Computer and Communications Security. New York, NY, USA. pp. 524–533. Archived (PDF) from the original on 2022-05-26. Retrieved 2010-01-10. (10 pages)
  14. "Multi-architecture (x86) and 64-bit alphanumeric shellcode explained". Blackhat Academy. Archived from the original on 2012-06-21.
  15. eugene (2001-08-11). "Architecture Spanning Shellcode". Phrack. Phrack Inc. #0x0e of 0x12. Archived from the original on 2021-11-09. Retrieved 2008-02-29.
  16. nemo (2005-11-13). "OSX - Multi arch shellcode". Full disclosure . Archived from the original on 2022-05-26. Retrieved 2022-05-26.
  17. Cha, Sang Kil; Pak, Brian; Brumley, David; Lipton, Richard Jay (2010-10-08) [2010-10-04]. Platform-Independent Programs (PDF). Proceedings of the 17th ACM conference on Computer and Communications Security (CCS'10). Chicago, Illinois, USA: Carnegie Mellon University, Pittsburgh, Pennsylvania, USA / Georgia Institute of Technology, Atlanta, Georgia, USA. pp. 547–558. doi:10.1145/1866307.1866369. ISBN   978-1-4503-0244-9. Archived (PDF) from the original on 2022-05-26. Retrieved 2022-05-26. (12 pages) (See also: )