Unreachable code

Last updated

In computer programming, unreachable code is part of the source code of a program which can never be executed because there exists no control flow path to the code from the rest of the program. [1]

Contents

Unreachable code is sometimes also called dead code, [2] [3] although dead code may also refer to code that is executed but has no effect on the output of a program. [4]

Unreachable code is generally considered undesirable for several reasons:

Unreachable code can have some legitimate uses, like providing a library of functions for calling or jumping to manually via a debugger while the program is halted after a breakpoint. This is particularly useful for examining and pretty-printing the internal state of the program. It may make sense to have such code in the shipped product, so that a developer can attach a debugger to a client's running instance.

Causes

Unreachable code can exist for many reasons, such as:

Legacy code is that which was once useful but is no longer used or required. But unreachable code may also be part of a complex library, module or routine where it is useful to others or under conditions which are not met in a particular scenario.

An example of such a conditionally unreachable code may be the implementation of a general string formatting function in a compiler's runtime library, which contains complex code to process all possible arguments, of which only a small subset is actually used. Compilers will typically not be able to remove the unused code sections at compile time, as the behavior is largely determined by the values of arguments at run time.

Examples

In this fragment of C code:

intfoo(intX,intY){returnX+Y;intZ=X*Y;}

the definition int Z = X * Y; is never reached as the function always returns before it. Therefore, the Z need be neither allocated storage nor initialized.

goto fail bug

Apple's SSL/TLS from February 2014 contained a major security flaw known formally as CVE - 2014-1266 and informally as the "goto fail bug". [5] [6] The relevant code fragment [7] is:

staticOSStatusSSLVerifySignedServerKeyExchange(SSLContext*ctx,boolisRsa,SSLBuffersignedParams,uint8_t*signature,UInt16signatureLen){OSStatuserr;...if((err=SSLHashSHA1.update(&hashCtx,&serverRandom))!=0)gotofail;if((err=SSLHashSHA1.update(&hashCtx,&signedParams))!=0)gotofail;gotofail;if((err=SSLHashSHA1.final(&hashCtx,&hashOut))!=0)gotofail;...fail:SSLFreeBuffer(&signedHashes);SSLFreeBuffer(&hashCtx);returnerr;}

Here, there are two successive goto fail statements. In the syntax of the C language, the second is unconditional, and hence always skips the call to SSLHashSHA1.final. As a consequence, err will hold the status of the SHA1 update operation, and signature verification will never fail. [5]

Here, the unreachable code is the call to the final function. [6] Applying the Clang compiler with the option -Weverything includes unreachable code analysis, which would trigger an alarm for this code. [6]

C++

In C++, some constructs are specified to have undefined behavior. A compiler is free to implement any behavior or none, and typically an optimizing compiler will assume the code is unreachable. [8]

Analysis

Detection of unreachable code is a form of control flow analysis to find code that can never be reached in any possible program state. In some languages (e.g. Java [9] ) some forms of unreachable code are explicitly disallowed. The optimization that removes unreachable code is known as dead code elimination.

Code may become unreachable as a consequence of transformations performed by an optimizing compiler (e.g., common subexpression elimination).

In practice the sophistication of the analysis has a significant impact on the amount of unreachable code that is detected. For example, constant folding and simple flow analysis shows that the inside of the if-statement in the following code is unreachable:

intN=2+1;if(N==4){/* unreachable */}

However, a great deal more sophistication is needed to work out that the corresponding block is unreachable in the following code:

doubleX=sqrt(2);if(X>5){/* unreachable */}

Unreachable code elimination technique is in the same class of optimizations as dead code elimination and redundant code elimination.

Unreachability vs. profiling

In some cases, a practical approach may be a combination of simple unreachability criteria and use of a profiler to handle the more complex cases. Profiling in general can not prove anything about the unreachability of a piece of code, but may be a good heuristic for finding potentially unreachable code. Once a suspect piece of code is found, other methods, such as a more powerful code analysis tool, or even analysis by hand, could be used to decide whether the code is truly unreachable.

See also

Related Research Articles

In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory. On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process, and sometimes a core dump.

An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory use, storage size, and power consumption.

Java and C++ are two prominent object-oriented programming languages. By many language popularity metrics, the two languages have dominated object-oriented and high-performance software development for much of the 21st century, and are often directly compared and contrasted. Java's syntax was based on C/C++.

The term dead code has multiple definitions. Some use the term to refer to code which can never be executed at run-time. In some areas of computer programming, dead code is a section in the source code of a program which is executed but whose result is never used in any other computation. The execution of dead code wastes computation time and memory.

Constant folding and constant propagation are related compiler optimizations used by many modern compilers. An advanced form of constant propagation known as sparse conditional constant propagation can more accurately propagate constants and simultaneously remove dead code.

In computer science, program analysis is the process of automatically analyzing the behavior of computer programs regarding a property such as correctness, robustness, safety and liveness. Program analysis focuses on two major areas: program optimization and program correctness. The first focuses on improving the program’s performance while reducing the resource usage while the latter focuses on ensuring that the program does what it is supposed to do.

In computing, inline expansion, or inlining, is a manual or compiler optimization that replaces a function call site with the body of the called function. Inline expansion is similar to macro expansion, but occurs during compilation, without changing the source code, while macro expansion occurs prior to compilation, and results in different text that is then processed by the compiler.

In computer programming, specifically when using the imperative programming paradigm, an assertion is a predicate connected to a point in the program, that always should evaluate to true at that point in code execution. Assertions can help a programmer read the code, help a compiler compile it, or help the program detect its own defects.

A programming tool or software development tool is a computer program that software developers use to create, debug, maintain, or otherwise support other programs and applications. The term usually refers to relatively simple programs, that can be combined to accomplish a task, much as one might use multiple hands to fix a physical object. The most basic tools are a source code editor and a compiler or interpreter, which are used ubiquitously and continuously. Other tools are used more or less depending on the language, development methodology, and individual engineer, often used for a discrete task, like a debugger or profiler. Tools may be discrete programs, executed separately – often from the command line – or may be parts of a single large program, called an integrated development environment (IDE). In many cases, particularly for simpler use, simple ad hoc techniques are used instead of a tool, such as print debugging instead of using a debugger, manual timing instead of a profiler, or tracking bugs in a text file or spreadsheet instead of a bug tracking system.

In the C and C++ programming languages, an inline function is one qualified with the keyword inline; this serves two purposes:

  1. It serves as a compiler directive that suggests that the compiler substitute the body of the function inline by performing inline expansion, i.e. by inserting the function code at the address of each function call, thereby saving the overhead of a function call. In this respect it is analogous to the register storage class specifier, which similarly provides an optimization hint.
  2. The second purpose of inline is to change linkage behavior; the details of this are complicated. This is necessary due to the C/C++ separate compilation + linkage model, specifically because the definition (body) of the function must be duplicated in all translation units where it is used, to allow inlining during compiling, which, if the function has external linkage, causes a collision during linking. C and C++ resolve this in different ways.

In compiler theory, dead-code elimination is a compiler optimization to remove dead code. Removing such code has several benefits: it shrinks program size, an important consideration in some contexts, it reduces resource usage such as the number of bytes to be transferred and it allows the running program to avoid executing irrelevant operations, which reduces its running time. It can also enable further optimizations by simplifying program structure. Dead code includes code that can never be executed, and code that only affects dead variables, that is, irrelevant to the program.

In computer science, a symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier, constant, procedure and function in a program's source code is associated with information relating to its declaration or appearance in the source. In other words, the entries of a symbol table store the information related to the entry's corresponding symbol.

In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification of the programming language in which the source code is written. This is different from unspecified behavior, for which the language specification does not prescribe a result, and implementation-defined behavior that defers to the documentation of another component of the platform.

In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recursion is particularly useful, and is often easy to optimize in implementations.

In computer programming languages, a switch statement is a type of selection control mechanism used to allow the value of a variable or expression to change the control flow of program execution via search and map.

In computer programming, redundant code is source code or compiled code in a computer program that is unnecessary, such as:

In computer programming, a branch table or jump table is a method of transferring program control (branching) to another part of a program using a table of branch or jump instructions. It is a form of multiway branch. The branch table construction is commonly used when programming in assembly language but may also be generated by compilers, especially when implementing optimized switch statements whose values are densely packed together.

Tracing just-in-time compilation is a technique used by virtual machines to optimize the execution of a program at runtime. This is done by recording a linear sequence of frequently executed operations, compiling them to native machine code and executing them. This is opposed to traditional just-in-time (JIT) compilers that work on a per-method basis.

In computer science, language-based security (LBS) is a set of techniques that may be used to strengthen the security of applications on a high level by using the properties of programming languages. LBS is considered to enforce computer security on an application-level, making it possible to prevent vulnerabilities which traditional operating system security is unable to handle.

re2c is a free and open-source lexer generator for C, C++, Go, and Rust. It compiles declarative regular expression specifications to deterministic finite automata. Originally written by Peter Bumbulis and described in his paper, re2c was put in public domain and has been since maintained by volunteers. It is the lexer generator adopted by projects such as PHP, SpamAssassin, Ninja build system and others. Together with the Lemon parser generator, re2c is used in BRL-CAD. This combination is also used with STEPcode, an implementation of ISO 10303 standard.

References

  1. Debray, Saumya K.; Evans, William; Muth, Robert; De Sutter, Bjorn (1 March 2000). "Compiler techniques for code compaction". ACM Transactions on Programming Languages and Systems. 22 (2): 378–415. CiteSeerX   10.1.1.43.7215 . doi:10.1145/349214.349233. S2CID   6129772.
  2. RTCA/DO-178C Software Considerations in Airborne Systems and Equipment Certification. RTCA, Inc. 2011. p. 112. Retrieved 2019-06-11. Dead code – Executable Object Code (or data) which exists as a result of a software development error but cannot be executed (code) or used (data) in any operational configuration of the target computer environment. It is not traceable to a system or software requirement. The following exceptions are often mistakenly categorized as dead code but are necessary for implementation of the requirements/design: embedded identifiers, defensive programming structures to improve robustness, and deactivated code such as unused library functions. [Since requirements-based review should identified such code as untraceable to functional requirements, static code analysis should identify such code as unreachable, and structural coverage analysis of requirements-based testing results should identify such code as unreachable, presence of unjustified dead code in a project should raise consideration of the effectiveness of the organization's development and verification processes.]
  3. Jay Thomas (24 January 2017). "Requirements Traceability Forms the Foundation for Thorough Software Testing" . Retrieved 2019-06-11. The combination of requirements traceability with coverage analysis can also turn up areas of "dead code," or code that's never executed. This code can mostly be an inconvenience, but it can also be a security threat if a hacker can gain access and from there gain control. It's code that can't be traced and should therefore be eliminated.
  4. MISRA Consortium (March 2013). MISRA C:2012 Guidelines for the used of C language in critical systems. MIRA Limited. p. 41. Retrieved 2019-06-11. Rule 2.2 there shall be no dead code. Any operation that is executed but whose removal would not affect program behavior constitutes dead code.
  5. 1 2 Adam Langley (2014). "Apple's SSL/TLS bug".
  6. 1 2 3 Arie van Deursen (2014). "Learning from Apple's #gotofail Security Bug".
  7. "sslKeyExchange.c - Source code for support for key exchange and server key exchange".
  8. "MSC15-C. Do not depend on undefined behavior". Carnegie Mellon University. 2020. Retrieved 28 September 2020. Because compilers are not obligated to generate code for undefined behavior, these behaviors are candidates for optimization.
  9. "Java Language Specification".