Malware research

Last updated

The notion of a self-reproducing computer program can be traced back to initial theories about the operation of complex automata. [1] John von Neumann showed that in theory a program could reproduce itself. This constituted a plausibility result in computability theory. Fred Cohen experimented with computer viruses and confirmed Neumann's postulate and investigated other properties of malware such as detectability and self-obfuscation using rudimentary encryption. His 1988 Doctoral dissertation was on the subject of computer viruses. [2]

John von Neumann mathematician and physicist

John von Neumann was a Hungarian-American mathematician, physicist, computer scientist, and polymath. Von Neumann was generally regarded as the foremost mathematician of his time and said to be "the last representative of the great mathematicians"; a genius who was comfortable integrating both pure and applied sciences.

Frederick B. Cohen is an American computer scientist and best known as the inventor of computer virus defense techniques. He gave the definition of "computer virus". Cohen is best known for his pioneering work on computer viruses, the invention of high integrity operating system mechanisms now in widespread use, and automation of protection management functions.

Cohen's faculty advisor, Leonard Adleman, presented a rigorous proof that, in the general case, algorithmic determination of the presence of a virus is undecidable. [3] This problem must not be mistaken for that of determination within a broad class of programs that a virus is not present. This problem differs in that it does not require the ability to recognize all viruses.

Leonard Adleman American theoretical computer scientist and professor of computer science and molecular biology at the University of Southern California

Leonard Adleman is an American computer scientist. He is one of the creators of the RSA encryption algorithm, for which he received the 2002 Turing Award, often called the Nobel prize of Computer science. He is also known for the creation of the field of DNA computing.

In computability theory and computational complexity theory, an undecidable problem is a decision problem for which it is proved to be impossible to construct an algorithm that always leads to a correct yes-or-no answer. The halting problem is an example: it can be proven that there is no algorithm that correctly determines whether arbitrary programs eventually halt when run.

Adleman's proof is perhaps the deepest result in malware computability theory to date and it relies on Cantor's diagonal argument as well as the halting problem. Ironically, it was later shown by Young and Yung that Adleman's work in cryptography is ideal in constructing a virus that is highly resistant to reverse-engineering by presenting the notion of a cryptovirus. [4] A cryptovirus is a virus that contains and uses a public key and randomly generated symmetric cipher initialization vector (IV) and session key (SK).

Cantors diagonal argument theorem

In set theory, Cantor's diagonal argument, also called the diagonalisation argument, the diagonal slash argument or the diagonal method, was published in 1891 by Georg Cantor as a mathematical proof that there are infinite sets which cannot be put into one-to-one correspondence with the infinite set of natural numbers. Such sets are now known as uncountable sets, and the size of infinite sets is now treated by the theory of cardinal numbers which Cantor began.

In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running or continue to run forever.

Cryptography practice and study of techniques for secure communication in the presence of third parties

Cryptography or cryptology is the practice and study of techniques for secure communication in the presence of third parties called adversaries. More generally, cryptography is about constructing and analyzing protocols that prevent third parties or the public from reading private messages; various aspects in information security such as data confidentiality, data integrity, authentication, and non-repudiation are central to modern cryptography. Modern cryptography exists at the intersection of the disciplines of mathematics, computer science, electrical engineering, communication science, and physics. Applications of cryptography include electronic commerce, chip-based payment cards, digital currencies, computer passwords, and military communications.

In the cryptoviral extortion attack, the virus hybrid encrypts plaintext data on the victim's machine using the randomly generated IV and SK. The IV+SK are then encrypted using the virus writer's public key. In theory the victim must negotiate with the virus writer to get the IV+SK back in order to decrypt the ciphertext (assuming there are no backups). Analysis of the virus reveals the public key, not the IV and SK needed for decryption, or the private key needed to recover the IV and SK. This result was the first to show that computational complexity theory can be used to devise malware that is robust against reverse-engineering.

In cryptography, plaintext or cleartext is unencrypted information, as opposed to information encrypted for storage or transmission. Plaintext usually means unencrypted information pending input into cryptographic algorithms, usually encryption algorithms. Cleartext usually refers to data that is transmitted or stored clear') ('in the.

Ciphertext encrypted information

In cryptography, ciphertext or cyphertext is the result of encryption performed on plaintext using an algorithm, called a cipher. Ciphertext is also known as encrypted or encoded information because it contains a form of the original plaintext that is unreadable by a human or computer without the proper cipher to decrypt it. Decryption, the inverse of encryption, is the process of turning ciphertext into readable plaintext. Ciphertext is not to be confused with codetext because the latter is a result of a code, not a cipher.

Computational complexity theory focuses on classifying computational problems according to their inherent difficulty, and relating these classes to each other. A computational problem is a task solved by a computer. A computation problem is solvable by mechanical application of mathematical steps, such as an algorithm.

A growing area of computer virus research is to mathematically model the infection behavior of worms using models such as Lotka–Volterra equations, which has been applied in the study of biological virus. Various virus propagation scenarios have been studied by researchers such as propagation of computer virus, fighting virus with virus like predator codes, [5] [6] effectiveness of patching etc.

Behavioral malware detection has been researched more recently. Most approaches to behavioral detection are based on analysis of system call dependencies. The executed binary code is traced using strace or more precise taint analysis to compute data-flow dependencies among system calls. The result is a directed graph such that nodes are system calls, and edges represent dependencies. For example, if a result returned by system call (either directly as a result or indirectly through output parameters) is later used as a parameter of system call . The origins of the idea to use system calls to analyze software can be found in the work of Forrest et al. [7] Christodorescu et al. [8] point out that malware authors cannot easily reorder system calls without changing the semantics of the program, which makes system call dependency graphs suitable for malware detection. They compute a difference between malware and goodware system call dependency graphs and use the resulting graphs for detection, achieving high detection rates. Kolbitsch et al. [9] pre-compute symbolic expressions and evaluate them on the syscall parameters observed at runtime.

System call

In computing, a system call is the programmatic way in which a computer program requests a service from the kernel of the operating system it is executed on. This may include hardware-related services, creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.

strace diagnostic, debugging and instructional userspace utility for Linux

strace is a diagnostic, debugging and instructional userspace utility for Linux. It is used to monitor and tamper with interactions between processes and the Linux kernel, which include system calls, signal deliveries, and changes of process state. The operation of strace is made possible by the kernel feature known as ptrace.

Taint checking is a feature in some computer programming languages, such as Perl and Ruby, designed to increase security by preventing malicious users from executing commands on a host computer. Taint checks highlight specific security risks primarily associated with web sites which are attacked using techniques such as SQL injection or buffer overflow attack approaches.

They detect dependencies by observing whether the result obtained by evaluation matches the parameter values observed at runtime. Malware is detected by comparing the dependency graphs of the training and test sets. Fredrikson et al. [10] describe an approach that uncovers distinguishing features in malware system call dependency graphs. They extract significant behaviors using concept analysis and leap mining. [11] Babic et al. [12] recently proposed a novel approach for both malware detection and classification based on grammar inference of tree automata. Their approach infers an automaton from dependency graphs, and they show how such an automaton could be used for detection and classification of malware.

Research in combining static and dynamic malware analysis techniques is also currently being conducted in an effort to minimize the shortcomings of both. Studies by researchers such as Islam et al. [13] are working to integrate static and dynamic techniques in order to better analyze and classify malware and malware variants.

Related Research Articles

Computer worm standalone malware computer program that replicates itself in order to spread to other computers

A computer worm is a standalone malware computer program that replicates itself in order to spread to other computers. Often, it uses a computer network to spread itself, relying on security failures on the target computer to access it. Worms almost always cause at least some harm to the network, even if only by consuming bandwidth, whereas viruses almost always corrupt or modify files on a targeted computer.

Malware is any software intentionally designed to cause damage to a computer, server, client, or computer network. Malware does the damage after it is implanted or introduced in some way into a target's computer and can take the form of executable code, scripts, active content, and other software. The code is described as computer viruses, worms, Trojan horses, ransomware, spyware, adware, and scareware, among other terms. Malware has a malicious intent, acting against the interest of the computer user—and so does not include software that causes unintentional harm due to some deficiency, which is typically described as a software bug.

An intrusion detection system (IDS) is a device or software application that monitors a network or systems for malicious activity or policy violations. Any malicious activity or violation is typically reported either to an administrator or collected centrally using a security information and event management (SIEM) system. A SIEM system combines outputs from multiple sources, and uses alarm filtering techniques to distinguish malicious activity from false alarms.

In the mathematical field of graph theory the Hamiltonian path problem and the Hamiltonian cycle problem are problems of determining whether a Hamiltonian path or a Hamiltonian cycle exists in a given graph. Both problems are NP-complete.

A rootkit is a collection of computer software, typically malicious, designed to enable access to a computer or an area of its software that is not otherwise allowed and often masks its existence or the existence of other software. The term rootkit is a concatenation of "root" and the word "kit". The term "rootkit" has negative connotations through its association with malware.

Antivirus software computer software to defend against malicious computer viruses

Antivirus software, or anti-virus software, also known as anti-malware, is a computer program used to prevent, detect, and remove malware.

Theoretical computer science subfield of computer science and of mathematics

Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on more mathematical topics of computing and includes the theory of computation.

Sheila Adele Greibach is a researcher in formal languages in computing, automata, compiler theory, and computer science. She is an Emeritus Professor of Computer Science at the University of California, Los Angeles, and has worked with Seymour Ginsburg and Michael A. Harrison in context-sensitive parsing using the stack automaton model.

In computer science, a binary decision diagram (BDD) or branching program is a data structure that is used to represent a Boolean function. On a more abstract level, BDDs can be considered as a compressed representation of sets or relations. Unlike other compressed representations, operations are performed directly on the compressed representation, i.e. without decompression. Other data structures used to represent Boolean functions include negation normal form (NNF), Zhegalkin polynomials, and propositional directed acyclic graphs (PDAG).

Dataflow is a term used in computing which has various meanings depending on application and the context in which the term is used. In the context of software architecture, data flow relates to stream processing or reactive programming.

Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. It is a formal theory for handling image structures at different scales, by representing an image as a one-parameter family of smoothed images, the scale-space representation, parametrized by the size of the smoothing kernel used for suppressing fine-scale structures. The parameter in this family is referred to as the scale parameter, with the interpretation that image structures of spatial size smaller than about have largely been smoothed away in the scale-space level at scale .

Mobile malware is malicious software that targets mobile phones or wireless-enabled Personal digital assistants (PDA), by causing the collapse of the system and loss or leakage of confidential information. As wireless phones and PDA networks have become more and more common and have grown in complexity, it has become increasingly difficult to ensure their safety and security against electronic attacks in the form of viruses or other malware.

Software visualization or software visualisation refers to the visualization of information of and related to software systems—either the architecture of its source code or metrics of their runtime behavior- and their development process by means of static, interactive or animated 2-D or 3-D visual representations of their structure, execution, behavior, and evolution.

As applied in the field of computer vision, graph cut optimization can be employed to efficiently solve a wide variety of low-level computer vision problems, such as image smoothing, the stereo correspondence problem, image segmentation, and many other computer vision problems that can be formulated in terms of energy minimization. Many of these energy minimization problems can be approximated by solving a maximum flow problem in a graph. Under most formulations of such problems in computer vision, the minimum energy solution corresponds to the maximum a posteriori estimate of a solution. Although many computer vision algorithms involve cutting a graph, the term "graph cuts" is applied specifically to those models which employ a max-flow/min-cut optimization.

A computer virus is a type of malicious software that, when executed, replicates itself by modifying other computer programs and inserting its own code. When this replication succeeds, the affected areas are then said to be "infected" with a computer virus.

In graph theory, the cycle rank of a directed graph is a digraph connectivity measure proposed first by Eggan and Büchi. Intuitively, this concept measures how close a digraph is to a directed acyclic graph (DAG), in the sense that a DAG has cycle rank zero, while a complete digraph of order n with a self-loop at each vertex has cycle rank n. The cycle rank of a directed graph is closely related to the tree-depth of an undirected graph and to the star height of a regular language. It has also found use in sparse matrix computations and logic (Rossman 2008).

Mobile security, or more specifically mobile device security, has become increasingly important in mobile computing. Of particular concern is the security of personal and business information now stored on smartphones.

References

  1. John von Neumann, "Theory of Self-Reproducing Automata", Part 1: Transcripts of lectures given at the University of Illinois, December 1949, Editor: A. W. Burks, University of Illinois, USA, 1966.
  2. Fred Cohen, "Computer Viruses", PhD Thesis, University of Southern California, ASP Press, 1988.
  3. L. M. Adleman, "An Abstract Theory of Computer Viruses", Advances in Cryptology---Crypto '88, LNCS 403, pp. 354-374, 1988.
  4. A. Young, M. Yung, "Cryptovirology: Extortion-Based Security Threats and Countermeasures," IEEE Symposium on Security & Privacy, pp. 129-141, 1996.
  5. H. Toyoizumi, A. Kara. Predators: Good Will Mobile Codes Combat against Computer Viruses. Proc. of the 2002 New Security Paradigms Workshop, 2002
  6. Zakiya M. Tamimi, Javed I. Khan, Model-Based Analysis of Two Fighting Worms, IEEE/IIU Proc. of ICCCE '06, Kuala Lumpur, Malaysia, May 2006, Vol-I, p. 157-163.
  7. S. Forrest, S. A. Hofmeyr, A. Somayaji, T. A. Longstaff, Thomas A.: A Sense of Self for Unix Processes, Proc. of the 1996 IEEE Symp. on Security and Privacy, 1996, p. 120-129.
  8. M. Christodorescu, S. Jha, C. Kruegel: Mining specifications of malicious behavior, Proc. of the 6th joint meeting of the European software engineering conf. and the ACM SIGSOFT symp. on The foundations of software engineering, 2007, p. 5-14
  9. C. Kolbitsch, P. Milani, C. Kruegel, E. Kirda, X. Zhou, and X. Wang: Effective and Efficient Malware Detection at the End Host, The 18th USENIX Security Symposium, 2009.
  10. M. Fredrikson, S. Jha, M. Christodorescu, R. Sailer, and X. Yan: Synthesizing Near-Optimal Malware Specifications from Suspicious Behaviors, Proc. of the 2010 IEEE Symposium on Security and Privacy, 2010, p. 45-60.
  11. X. Yan, H. Cheng, J. Han, and P. S. Yu: Mining significant graph patterns by leap search in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD’08). New York, NY, USA: ACM Press, 2008, pp. 433-444
  12. D. Babic, D. Reynaud, and D. Song: Malware Analysis with Tree Automata Inference, in Proceedings of the 23rd Int. Conference on Computer Aided Verification, 2011, Springer.
  13. R. Islam, R. Tian, L. M. Batten, and S. Versteeg: Classification of malware based on integrated staic and dynamic features, Journal of Network Computer Applications, 2013, p. 646-656.