Metamorphic code

Last updated

Metamorphic code is code that when run outputs a logically equivalent version of its own code under some interpretation. This is similar to a quine, except that a quine's source code is exactly equivalent to its own output. Metamorphic code also usually outputs machine code and not its own source code.

Contents

Overview

Metamorphic code is used by computer viruses to avoid the pattern recognition of anti-virus software. Metamorphic viruses often translate their own binary code into a temporary representation, editing the temporary representation of themselves and then translate the edited form back to machine code again. [1] This procedure is done with the virus itself, and thus also the metamorphic engine itself undergoes changes, which means that no part of the virus stays the same. This differs from polymorphic code, where the polymorphic engine can not rewrite its own code.

Metamorphic code is used by some viruses when they are about to infect new files, and the result is that the next generation will never look like current generation. The mutated code will do exactly the same thing (under the interpretation used), but the child's binary representation will typically be completely different from the parent's. Mutation can be achieved using techniques like inserting NOP instructions (brute force), changing what registers to use, changing flow control with jumps, changing machine instructions to equivalent ones or reordering independent instructions.

Metamorphism does not protect a virus against heuristic analysis.[ citation needed ]

Metamorphic code can also mean that a virus is capable of infecting executables from two or more different operating systems (such as Windows and Linux) or even different computer architectures. Often, the virus does this by carrying several viruses within itself. The beginning of the virus is then coded so that it translates to correct machine-code for all of the platforms that it is supposed to execute in. [2] This is used primarily in remote exploit injection code where the target platform is unknown.

Metamorphic viruses

See also

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

<span class="mw-page-title-main">Machine code</span> Set of instructions executed by a computer

In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Although decimal computers were once common, the contemporary marketplace is dominated by binary computers; for those computers, machine code is "the binary representation of a computer program which is actually read and interpreted by the computer. A program in machine code consists of a sequence of machine instructions ."

A disassembler is a computer program that translates machine language into assembly language—the inverse operation to that of an assembler. Disassembly, the output of a disassembler, is often formatted for human-readability rather than suitability for input to an assembler, making it principally a reverse-engineering tool. Common uses of disassemblers include analyzing high-level programing language compilers output and their optimizations, recovering source code of a program whose original source was lost, malware analysis, modifying software, and software cracking.

<span class="mw-page-title-main">Interpreter (computing)</span> Program that executes source code without a separate compilation step

In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:

  1. Parse the source code and perform its behavior directly;
  2. Translate source code into some efficient intermediate representation or object code and immediately execute that;
  3. Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter Virtual Machine.

In computer science, self-modifying code is code that alters its own instructions while it is executing – usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code, thus simplifying maintenance. The term is usually only applied to code where the self-modification is intentional, not in situations where code accidentally modifies itself due to an error such as a buffer overflow.

In computing, polymorphic code is code that uses a polymorphic engine to mutate while keeping the original algorithm intact - that is, the code changes itself every time it runs, but the function of the code stays the same. For example, the simple math expressions 3+1 and 6-2 both achieve the same result, yet run with different machine code in a CPU. This technique is sometimes used by computer viruses, shellcodes and computer worms to hide their presence.

In computing, binary translation is a form of binary recompilation where sequences of instructions are translated from a source instruction set to the target instruction set. In some cases such as instruction set simulation, the target instruction set may be the same as the source instruction set, providing testing and debugging features such as instruction trace, conditional breakpoints and hot spot detection.

A backdoor is a typically covert method of bypassing normal authentication or encryption in a computer, product, embedded device, or its embodiment. Backdoors are most often used for securing remote access to a computer, or obtaining access to plaintext in cryptosystems. From there it may be used to gain access to privileged information like passwords, corrupt or delete data on hard drives, or transfer information within autoschediastic networks.

<span class="mw-page-title-main">Memory address</span> Reference to a specific memory location

In computing, a memory address is a reference to a specific memory location used at various levels by software and hardware. Memory addresses are fixed-length sequences of digits conventionally displayed and manipulated as unsigned integers. Such numerical semantic bases itself upon features of CPU, as well upon use of the memory like an array endorsed by various programming languages.

<span class="mw-page-title-main">Binary file</span> Non-human-readable computer file encoded in binary form

A binary file is ANY computer file, this includes files that are a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document files containing formatted text, such as older Microsoft Word document files, contain the text of the document but also contain formatting information in binary form.

An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" IR must be accurate – capable of representing the source code without loss of information – and independent of any particular source or target language. An IR may take one of several forms: an in-memory data structure, or a special tuple- or stack-based code readable by the program. In the latter case it is also called an intermediate language.

Dataflow architecture is a dataflow-based computer architecture that directly contrasts the traditional von Neumann architecture or control flow architecture. Dataflow architectures have no program counter, in concept: the executability and execution of instructions is solely determined based on the availability of input arguments to the instructions, so that the order of instruction execution may be hard to predict.

Win32/Simile is a metamorphic computer virus written in assembly language for Microsoft Windows. The virus was released in its most recent version in early March 2002. It was written by the virus writer "Mental Driller". Some of his previous viruses, such as Win95/Drill, have proved very challenging to detect.

Zmist is a metamorphic computer virus created by the Russian virus writer known as Z0mbie. It was the first virus to use a technique known as "code integration". In the words of Ferrie and Ször:

This virus supports a unique new technique: code integration. The Mistfall engine contained in it is capable of decompiling Portable Executable files to [their] smallest elements, requiring 32 MB of memory. Zmist will insert itself into the code: it moves code blocks out of the way, inserts itself, regenerates code and data references, including relocation information, and rebuilds the executable.

CTX is a computer virus created in Spain in 1999. CTX was initially discovered as part of the Cholera worm, with which the author intentionally infected with CTX. Although the Cholera worm had the capability to send itself via email, the CTX worm quickly surpassed it in prevalence. Cholera is now considered obsolete, while CTX remains in the field, albeit with only rare discoveries.

The Little Man Computer (LMC) is an instructional model of a computer, created by Dr. Stuart Madnick in 1965. The LMC is generally used to teach students, because it models a simple von Neumann architecture computer—which has all of the basic features of a modern computer. It can be programmed in machine code or assembly code.

Binary-code compatibility is a property of a computer system, meaning that it can run the same executable code, typically machine code for a general-purpose computer Central processing unit (CPU), that another computer system can run. Source-code compatibility, on the other hand, means that recompilation or interpretation is necessary before the program can be run on the compatible system.

A decompiler is a computer program that translates an executable file to high-level source code. It does therefore the opposite of a typical compiler, which translates a high-level language to a low-level language. While disassemblers translate an executable into assembly language, decompilers go a step further and translate the code into a higher level language such as C or Java, requiring more sophisticated techniques. Decompilers are usually unable to perfectly reconstruct the original source code, thus will frequently produce obfuscated code. Nonetheless, they remain an important tool in the reverse engineering of computer software.

<span class="mw-page-title-main">Computer virus</span> Computer program that modifies other programs to replicate itself and spread

A computer virus is a type of malware that, when executed, replicates itself by modifying other computer programs and inserting its own code into those programs. If this replication succeeds, the affected areas are then said to be "infected" with a computer virus, a metaphor derived from biological viruses.

A macro in computer science is a rule or pattern that specifies how a certain input sequence should be mapped to a replacement input sequence according to a defined procedure.

References

  1. "Metamorphism in practice or "How I made MetaPHOR and what I've learnt"". VX Heavens. February 2002. Archived from the original on June 2, 2007.
  2. "Architecture Spanning Shellcode". Phrack Magazine. Vol. 11, no. 57. August 11, 2001. Archived from the original on December 4, 2023.
  3. Peter Ferrie "Crimea River", VB, 2008