Compiler correctness

Last updated

In computing, compiler correctness is the branch of computer science that deals with trying to show that a compiler behaves according to its language specification.[ citation needed ] Techniques include developing the compiler using formal methods and using rigorous testing (often called compiler validation) on an existing compiler.

Contents

Formal verification

Two main formal verification approaches for establishing correctness of compilation are proving correctness of the compiler for all inputs and proving correctness of a compilation of a particular program (translation validation).

Compiler correctness for all input programs

Compiler validation with formal methods involves a long chain of formal, deductive logic. [1] However, since the tool to find the proof (theorem prover) is implemented in software and is complex, there is a high probability it will contain errors. One approach has been to use a tool that verifies the proof (a proof checker) which, because it is much simpler than a proof-finder, is less likely to contain errors.

A prominent example of this approach is CompCert, which is a formally verified optimizing compiler of a large subset of C99. [2] [3] [4]

Another verified compiler was developed in CakeML project, [5] which establishes correctness of a substantial subset of Standard ML programming language using the HOL (proof assistant).

Another approach to obtain a formally correct compiler is to use semantics-directed compiler generation. [6]

Translation validation: compiler correctness on a given program

In contrast to attempting to prove that a compiler is correct for all valid input programs translation validation [7] aims to automatically establish that a given input program is compiled correctly. Proving correct compilation of a given program is potentially easier than proving a compiler correct for all programs, but still requires symbolic reasoning, because a fixed program may still work on arbitrarily large inputs and run for arbitrarily long amount of time. Translation validation can reuse an existing compiler implementation by generating, for a given compilation, a proof that the compilation was correct. Translation validation can be used even with a compiler that sometimes generates incorrect code, as long as this incorrect does not manifest itself for a given program. Depending on the input program the translation validation can fail (because the generated code is wrong or the translation validation technique is too weak to show correctness). However, if translation validation succeeds, then the compiled program is guaranteed to be correct for all inputs.

Testing

Testing represents a significant portion of the effort in shipping a compiler, but receives comparatively little coverage in the standard literature. The 1986 edition of Aho, Sethi, & Ullman has a single-page section on compiler testing, with no named examples. [8] The 2006 edition omits the section on testing, but does emphasize its importance: “Optimizing compilers are so difficult to get right that we dare say that no optimizing compiler is completely error-free! Thus, the most important objective in writing a compiler is that it is correct.” [9] Fraser & Hanson 1995 has a brief section on regression testing; source code is available. [10] Bailey & Davidson 2003 cover testing of procedure calls [11] A number of articles confirm that many released compilers have significant code-correctness bugs. [12] Sheridan 2007 is probably the most recent journal article on general compiler testing. [13] For most purposes, the largest body of information available on compiler testing are the Fortran [14] and Cobol [15] validation suites.

Further common techniques when testing compilers are fuzzing [16] (which generates random programs to try to find bugs in a compiler) and test case reduction (which tries to minimize a found test case to make it easier to understand). [17]

See also

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

In the context of hardware and software systems, formal verification is the act of proving or disproving the correctness of a system with respect to a certain formal specification or property, using formal methods of mathematics. Formal verification is a key incentive for formal specification of systems, and is at the core of formal methods. It represents an important dimension of analysis and verification in electronic design automation and is one approach to software verification. The use of formal verification enables the highest Evaluation Assurance Level (EAL7) in the framework of common criteria for computer security certification.

In theoretical computer science, an algorithm is correct with respect to a specification if it behaves as specified. Best explored is functional correctness, which refers to the input-output behavior of the algorithm.

<span class="mw-page-title-main">Xavier Leroy</span> French computer scientist and programmer (born 1968)

Xavier Leroy is a French computer scientist and programmer. He is best known for his role as a primary developer of the OCaml system. He is Professor of software science at Collège de France. Before his appointment at Collège de France in 2018, he was senior scientist at the French government research institution Inria.

In software engineering, profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering.

Thread Level Speculation (TLS), also known as Speculative Multi-threading, or Speculative Parallelization, is a technique to speculatively execute a section of computer code that is anticipated to be executed later in parallel with the normal execution on a separate independent thread. Such a speculative thread may need to make assumptions about the values of input variables. If these prove to be invalid, then the portions of the speculative thread that rely on these input variables will need to be discarded and squashed. If the assumptions are correct the program can complete in a shorter time provided the thread was able to be scheduled efficiently.

<span class="mw-page-title-main">Incremental computing</span> Software feature

Incremental computing, also known as incremental computation, is a software feature which, whenever a piece of data changes, attempts to save time by only recomputing those outputs which depend on the changed data. When incremental computing is successful, it can be significantly faster than computing new outputs naively. For example, a spreadsheet software package might use incremental computation in its recalculation feature, to update only those cells containing formulas which depend on the changed cells.

In type theory, a refinement type is a type endowed with a predicate which is assumed to hold for any element of the refined type. Refinement types can express preconditions when used as function arguments or postconditions when used as return types: for instance, the type of a function which accepts natural numbers and returns natural numbers greater than 5 may be written as . Refinement types are thus related to behavioral subtyping.

In computer science, region-based memory management is a type of memory management in which each allocated object is assigned to a region. A region, also called a zone, arena, area, or memory context, is a collection of allocated objects that can be efficiently reallocated or deallocated all at once. Like stack allocation, regions facilitate allocation and deallocation of memory with low overhead; but they are more flexible, allowing objects to live longer than the stack frame in which they were allocated. In typical implementations, all objects in a region are allocated in a single contiguous range of memory addresses, similarly to how stack frames are typically allocated.

CompCert is a formally verified optimizing compiler for a large subset of the C99 programming language which currently targets PowerPC, ARM, RISC-V, x86 and x86-64 architectures. This project, led by Xavier Leroy, started officially in 2005, funded by the French institutes ANR and INRIA. The compiler is specified, programmed and proven in Coq. It aims to be used for programming embedded systems requiring reliability. The performance of its generated code is often close to that of GCC at optimization level -O1, and always better than that of GCC without optimizations.

Extended static checking (ESC) is a collective name in computer science for a range of techniques for statically checking the correctness of various program constraints. ESC can be thought of as an extended form of type checking. As with type checking, ESC is performed automatically at compile time. This distinguishes it from more general approaches to the formal verification of software, which typically rely on human-generated proofs. Furthermore, it promotes practicality over soundness, in that it aims to dramatically reduce the number of false positives at the cost of introducing some false negatives. ESC can identify a range of errors that are currently outside the scope of a type checker, including division by zero, array out of bounds, integer overflow and null dereferences.

<span class="mw-page-title-main">Device driver synthesis and verification</span>

Device drivers are programs which allow software or higher-level computer programs to interact with a hardware device. These software components act as a link between the devices and the operating systems, communicating with each of these systems and executing commands. They provide an abstraction layer for the software above and also mediate the communication between the operating system kernel and the devices below.

Metamorphic testing (MT) is a property-based software testing technique, which can be an effective approach for addressing the test oracle problem and test case generation problem. The test oracle problem is the difficulty of determining the expected outcomes of selected test cases or to determine whether the actual outputs agree with the expected outcomes.

AbsInt is a software-development tools vendor based in Saarbrücken, Germany. The company was founded in 1998 as a technology spin-off from the Department of Programming Languages and Compiler Construction of Prof. Reinhard Wilhelm at Saarland University. AbsInt specializes in software-verification tools based on abstract interpretation. Its tools are used worldwide by Fortune 500 companies, educational institutions, government agencies and startups.

Automatic bug-fixing is the automatic repair of software bugs without the intervention of a human programmer. It is also commonly referred to as automatic patch generation, automatic bug repair, or automatic program repair. The typical goal of such techniques is to automatically generate correct patches to eliminate bugs in software programs without causing software regression.

John Regehr is a computer scientist specializing in compiler correctness and undefined behavior. As of 2016, he is a professor at the University of Utah. He is best known for the integer overflow sanitizer which was merged into the Clang C compiler, the C compiler fuzzer Csmith, and his widely read blog Embedded in Academia. He spent the 2015-2016 academic year on sabbatical in Paris, France, working with TrustInSoft on Frama-C and related code analysis tools.

Csmith is a test case generation tool. It can generate random C programs that statically and dynamically conform to the C99 standard. It is used for stress-testing compilers, static analyzers, and other tools that process C code. It is a free, open source, permissively licensed C compiler fuzzer developed by researchers at the University of Utah. It was previously called Randprog.

Differential testing, also known as differential fuzzing, is a popular software testing technique that attempts to detect bugs, by providing the same input to a series of similar applications, and observing differences in their execution. Differential testing complements traditional software testing, because it is well-suited to find semantic or logic bugs that do not exhibit explicit erroneous behaviors like crashes or assertion failures. Differential testing is sometimes called back-to-back testing.

Sandrine Blazy is a French computer scientist known for her research in the formal verification of compilers, and especially for her work as a developer of CompCert, a compiler for a large subset of C99 that is "the first industrial-strength compiler with a mechanically checked proof of correctness". She is a professor at the University of Rennes and deputy director of IRISA, the Institut de recherche en informatique et systèmes aléatoires of the University of Rennes and the French National Centre for Scientific Research.

References

  1. De Millo, R. A.; Lipton, R. J.; Perlis, A. J. (1979). "Social processes and proofs of theorems and programs". Communications of the ACM. 22 (5): 271–280. doi: 10.1145/359104.359106 . S2CID   6794058.
  2. Leroy, X. (2006). "Formal Certification of a Compiler Back-End or: Programming a Compiler with a Proof Assistant". ACM SIGPLAN Notices. 41: 42–54. doi:10.1145/1111320.1111042.
  3. Leroy, Xavier (2009-12-01). "A Formally Verified Compiler Back-end". Journal of Automated Reasoning. 43 (4): 363–446. arXiv: 0902.2137 . doi:10.1007/s10817-009-9155-4. ISSN   0168-7433. S2CID   87730.
  4. "CompCert - The CompCert C compiler". compcert.inria.fr. Retrieved 2017-07-21.
  5. "CakeML: A Verified Implementation of ML".
  6. Stephan Diehl, Natural Semantics Directed Generation of Compilers and Abstract Machines,Formal Aspects of Computing, Vol. 12 (2), Springer Verlag, 2000. doi : 10.1007/PL00003929
  7. Pnueli, Amir; Siegel, Michael; Singerman, Eli. Translation Validation. Tools and Algorithms for Construction and Analysis of Systems, 4th International Conference, TACAS '98.
  8. Compilers: Principles, Techniques and Tools,infra 1986, p. 731.
  9. ibid, 2006, p. 16.
  10. Christopher Fraser; David Hanson (1995). A Retargetable C compiler: Design and Implementation . Benjamin/Cummings Publishing. ISBN   978-0-8053-1670-4., pp. 531–3.
  11. Mark W. Bailey; Jack W. Davidson (2003). "Automatic Detection and Diagnosis of Faults in Generated Code for Procedure Calls" (PDF). IEEE Transactions on Software Engineering. 29 (11): 1031–1042. CiteSeerX   10.1.1.15.4829 . doi:10.1109/tse.2003.1245304. Archived from the original (PDF) on 2003-04-28. Retrieved 2009-03-24., p. 1040.
  12. E.g., Christian Lindig (2005). "Random testing of C calling conventions" (PDF). Proceedings of the Sixth International Workshop on Automated Debugging. ACM. ISBN   1-59593-050-7. Archived from the original (PDF) on 2011-07-11. Retrieved 2009-03-24., and Eric Eide; John Regehr (2008). "Volatiles are miscompiled, and what to do about it" (PDF). Proceedings of the 7th ACM international conference on Embedded software. ACM. ISBN   978-1-60558-468-3 . Retrieved 2009-03-24.
  13. Flash Sheridan (2007). "Practical Testing of a C99 Compiler Using Output Comparison" (PDF). Software: Practice and Experience. 37 (14): 1475–1488. arXiv: 2202.07390 . doi:10.1002/spe.812. S2CID   9752084 . Retrieved 2009-03-24. Bibliography at "Compiler Testing Bibliography" . Retrieved 2009-03-13..
  14. "Source of Fortran validation suite" . Retrieved 2011-09-01.
  15. "Source of Cobol validation suite" . Retrieved 2011-09-01.
  16. Chen, Yang; Groce, Alex; Zhang, Chaoqiang; Wong, Weng-Keen; Fern, Xiaoli; Eide, Eric; Regehr, John (2013). "Taming compiler fuzzers". Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI '13. New York, NY, USA: ACM. pp. 197–208. CiteSeerX   10.1.1.308.5541 . doi:10.1145/2491956.2462173. ISBN   9781450320146. S2CID   207205614.
  17. Regehr, John; Chen, Yang; Cuoq, Pascal; Eide, Eric; Ellison, Chucky; Yang, Xuejun (2012). "Test-case reduction for C compiler bugs". Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI '12. New York, NY, USA: ACM. pp. 335–346. doi:10.1145/2254064.2254104. ISBN   9781450312059. S2CID   1025409.