Csmith

Last updated
Csmith
Original author(s) Xuejun Yang, Yang Chen, Eric Eide, John Regehr
Initial release2011;10 years ago (2011)
Stable release
2.3.0 / June 21, 2017;3 years ago (2017-06-21)
Repository github.com/csmith-project/csmith/
Written in C++, Perl
Type Compiler fuzzer
License BSD license
Website embed.cs.utah.edu/csmith/

Csmith is a test case generation tool. It can generate random C programs that statically and dynamically conform to the C99 standard. It is used for stress-testing compilers, static analyzers, and other tools that process C code. It is a free, open source, permissively licensed C compiler fuzzer developed by researchers at the University of Utah. It was previously called Randprog. [1]

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language to create an executable program.

C (programming language) general-purpose programming language

C is a general-purpose, procedural computer programming language supporting structured programming, lexical variable scope, and recursion, with a static type system. By design, C provides constructs that map efficiently to typical machine instructions. It has found lasting use in applications previously coded in assembly language. Such applications include operating systems and various application software for computer architectures that range from supercomputers to PLCs and embedded systems.

Static program analysis is the analysis of computer software that is performed without actually executing programs, in contrast with dynamic analysis, which is analysis performed on programs while they are executing. In most cases the analysis is performed on some version of the source code, and in the other cases, some form of the object code.

OCaml is a general-purpose, multi-paradigm programming language which extends the Caml dialect of ML with object-oriented features. OCaml was created in 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy, Ascánder Suárez, and others.

Common Intermediate Language (CIL), formerly called Microsoft Intermediate Language (MSIL) or Intermediate Language (IL), is the intermediate language binary instruction set defined within the Common Language Infrastructure (CLI) specification. CIL instructions are executed by a CLI-compatible runtime environment such as the Common Language Runtime. Languages which target the CLI compile to CIL. CIL is object-oriented, stack-based bytecode. Runtimes typically just-in-time compile CIL instructions into native code.

In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.

A programming tool or software development tool is a computer program that software developers use to create, debug, maintain, or otherwise support other programs and applications. The term usually refers to relatively simple programs, that can be combined together to accomplish a task, much as one might use multiple hands to fix a physical object. The most basic tools are a source code editor and a compiler or interpreter, which are used ubiquitously and continuously. Other tools are used more or less depending on the language, development methodology, and individual engineer,often used for a discrete task, like a debugger or profiler. Tools may be discrete programs, executed separately – often from the command line – or may be parts of a single large program, called an integrated development environment (IDE). In many cases, particularly for simpler use, simple ad hoc techniques are used instead of a tool, such as print debugging instead of using a debugger, manual timing instead of a profiler, or tracking bugs in a text file or spreadsheet instead of a bug tracking system.

LLVM Compiler backend for multiple programming languages

The LLVM compiler infrastructure project is a set of compiler and toolchain technologies, which can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes.

SystemVerilog hardware description and hardware verification language

SystemVerilog, standardized as IEEE 1800, is a hardware description and hardware verification language used to model, design, simulate, test and implement electronic systems. SystemVerilog is based on Verilog and some extensions, and since 2008 Verilog is now part of the same IEEE standard. It is commonly used in the semiconductor and electronic design industry as an evolution of Verilog.

The Java Modeling Language (JML) is a specification language for Java programs, using Hoare style pre- and postconditions and invariants, that follows the design by contract paradigm. Specifications are written as Java annotation comments to the source files, which hence can be compiled with any Java compiler.

Call graph

A call graph is a control flow graph, which represents calling relationships between subroutines in a computer program. Each node represents a procedure and each edge (f, g) indicates that procedure f calls procedure g. Thus, a cycle in the graph indicates recursive procedure calls.

Cobra is a discontinued general-purpose, object-oriented programming language. Cobra is designed by Charles Esterbrook, and runs on the Microsoft .NET and Mono platforms. It is strongly influenced by Python, C#, Eiffel, Objective-C, and other programming languages. It supports both static and dynamic typing. It has support for unit tests and contracts. It has lambda expressions, closures, list comprehensions, and generators.

Profile-guided optimization, also known as profile-directed feedback (PDF), and feedback-directed optimization (FDO) is a compiler optimization technique in computer programming that uses profiling to improve program runtime performance.

GrammaTech is a software-development tools vendor based in Ithaca, New York. The company was founded in 1988 as a technology spin-off of Cornell University. It now develops CodeSonar, a static analysis tool for source code and binaries, and performs cyber-security research.

In computing, compiler correctness is the branch of computer science that deals with trying to show that a compiler behaves according to its language specification. Techniques include developing the compiler using formal methods and using rigorous testing on an existing compiler.

Random test generators are a type of computer software that is used in functional verification of microprocessors. Their primary use lies in providing input stimulus to a device under test.

Random testing is a black-box software testing technique where programs are tested by generating random, independent inputs. Results of the output are compared against software specifications to verify that the test output is pass or fail. In case of absence of specifications the exceptions of the language are used which means if an exception arises during test execution then it means there is a fault in the program, it is also used as way to avoid biased testing.

John Regehr is a computer scientist specializing in compiler correctness and undefined behavior. As of 2016, he is a professor at the University of Utah. He is best known for the integer overflow sanitizer which was merged into the Clang C compiler, the C compiler fuzzer Csmith, and his widely read blog Embedded in Academia. He spent the 2015-2016 academic year on sabbatical in Paris, France, working with TrustInSoft on Frama-C and related code analysis tools.

References

  1. Yang, Xuejun; Chen, Yang; Eide, Eric; Regehr, John (2011). "Finding and understanding bugs in C compilers". Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation - PLDI '11. p. 283. CiteSeerX   10.1.1.225.1281 . doi:10.1145/1993498.1993532. ISBN   9781450306638.