Heisenbug

Last updated

In computer programming jargon, a heisenbug is a software bug that seems to disappear or alter its behavior when one attempts to study it. [1] The term is a pun on the name of Werner Heisenberg, the physicist who first asserted the observer effect of quantum mechanics, which states that the act of observing a system inevitably alters its state. In electronics, the traditional term is probe effect, where attaching a test probe to a device changes its behavior.

Contents

Similar terms, such as bohrbug, mandelbug, [2] [3] [4] hindenbug, and schrödinbug [5] [6] (see the section on related terms) have been occasionally proposed for other kinds of unusual software bugs, sometimes in jest. [7] [8]

Examples

Heisenbugs occur because common attempts to debug a program, such as inserting output statements or running it with a debugger, usually have the side-effect of altering the behavior of the program in subtle ways, such as changing the memory addresses of variables and the timing of its execution.

One common example of a heisenbug is a bug that appears when the program is compiled with an optimizing compiler, but not when the same program is compiled without optimization (as is often done for the purpose of examining it with a debugger). While debugging, values that an optimized program would normally keep in registers are often pushed to main memory. This may affect, for instance, the result of floating-point comparisons, since the value in memory may have smaller range and accuracy than the value in the register[ citation needed ]. Similarly, heisenbugs may be caused by side-effects in test expressions used in runtime assertions in languages such as C and C++, where the test expression is not evaluated when assertions are turned off in production code using the NDEBUG macro.

Other common causes of heisenbugs are using the value of a non-initialized variable (which may change its address or initial value during debugging), or following an invalid pointer (which may point to a different place when debugging). Debuggers also commonly allow the use of breakpoints or provide other user interfaces that cause additional source code (such as property accessors) to be executed stealthily, which can, in turn, change the state of the program. [9]

End-Users can experience a heisenbug when the act of making a screenshot of the heisenbug for observation fixes the heisenbug and the screenshot shows a perfectly working state. This effect can happen when complex stacks of software work together, for example a web browser using a hardware graphic card acceleration which causes rendering errors on the physical screen that do not show on a screenshot.

Time can also be a factor in heisenbugs, particularly with multi-threaded applications. Executing a program under control of a debugger can change the execution timing of the program as compared to normal execution. Time-sensitive bugs, such as race conditions, may not occur when the program is slowed down by single-stepping source lines in the debugger. This is particularly true when the behavior involves interaction with an entity not under the control of a debugger, such as when debugging network packet processing between two machines and only one is under debugger control.

Heisenbugs can be viewed as instances of the observer effect in information technology. Frustrated programmers may humorously blame a heisenbug on the phase of the moon, [10] or (if it has occurred only once) may explain it away as a soft error due to alpha particles or cosmic rays affecting the hardware, a well-documented phenomenon known as single event effects.

A bohrbug, by way of contrast, is a "good, solid bug". Like the deterministic Bohr atom model, they do not change their behavior and are relatively easily detected. [11] [12]

A mandelbug (named after Benoît Mandelbrot's fractal) is a bug whose causes are so complex it defies repair, or makes its behavior appear chaotic or even non-deterministic. [2] The term also refers to a bug that exhibits fractal behavior (that is, self-similarity) by revealing more bugs (the deeper a developer goes into the code to fix it the more bugs they find).[ citation needed ]

A schrödinbug or schroedinbug (named after Erwin Schrödinger and his thought experiment) is a bug that manifests itself in running software after a programmer notices that the code should never have worked in the first place. [5]

A hindenbug [13] (named after the Hindenburg disaster) is a bug with catastrophic behavior.

A higgs-bugson [14] [15] (named after the Higgs boson particle) is a bug that is predicted to exist based upon other observed conditions (most commonly, vaguely related log entries and anecdotal user reports) but is difficult, if not impossible, to artificially reproduce in a development or test environment. The term may also refer to a bug that is obvious in the code (mathematically proven), but which cannot be seen in execution (yet difficult or impossible to actually find in existence).

Etymology

The term was used in 1985 by Jim Gray, in a paper about software failures [16] (and is sometimes mistakenly attributed to him because of this publication) and also in 1986 by Jonathan Clark and Zhahai Stewart on the mailing list (later Usenet news group) comp.risks. [17]

Bruce Lindsay, a researcher at IBM, affirmed in a 2004 ACM Queue interview that he was present when the Heisenbug was originally defined. [18]

An earlier appearance in ACM publications is from 1983. [19]

Resolution

Heisenbugs are difficult to identify and fix; often attempting to resolve them leads to further unexpected behavior. Because the problem manifests as the result of a separate, underpinning bug, the behavior can be hard to predict and analyze during debugging. Overall the number of heisenbugs identified should decrease as a piece of software matures. [20]

See also

Related Research Articles

Computer programming or coding is the composition of sequences of instructions, called programs, that computers can follow to perform tasks. It involves designing and implementing algorithms, step-by-step specifications of procedures, by writing code in one or more programming languages. Programmers typically use high-level programming languages that are more easily intelligible to humans than machine code, which is directly executed by the central processing unit. Proficient programming usually requires expertise in several different subjects, including knowledge of the application domain, details of programming languages and generic code libraries, specialized algorithms, and formal logic.

<span class="mw-page-title-main">GNU Debugger</span> Source-level debugger

The GNU Debugger (GDB) is a portable debugger that runs on many Unix-like systems and works for many programming languages, including Ada, Assembly, C, C++, D, Fortran, Haskell, Go, Objective-C, OpenCL C, Modula-2, Pascal, Rust, and partially others.

In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory. On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process, and sometimes a core dump.

A software bug is an error, flaw or fault in the design, development, or operation of computer software that causes it to produce an incorrect or unexpected result, or to behave in unintended ways. The process of finding and correcting bugs is termed "debugging" and often uses formal techniques or tools to pinpoint bugs. Since the 1950s, some computer systems have been designed to detect or auto-correct various software errors during operations.

<span class="mw-page-title-main">Embedded system</span> Computer system with a dedicated function

An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is embedded as part of a complete device often including electrical or electronic hardware and mechanical parts. Because an embedded system typically controls physical operations of the machine that it is embedded within, it often has real-time computing constraints. Embedded systems control many devices in common use. In 2009, it was estimated that ninety-eight percent of all microprocessors manufactured were used in embedded systems.

<span class="mw-page-title-main">Debugger</span> Computer program used to test and debug other programs

A debugger or debugging tool is a computer program used to test and debug other programs. The main use of a debugger is to run the target program under controlled conditions that permit the programmer to track its execution and monitor changes in computer resources that may indicate malfunctioning code. Typical debugging facilities include the ability to run or halt the target program at specific points, display the contents of memory, CPU registers or storage devices, and modify memory or register contents in order to enter selected test data that might be a cause of faulty program execution.

Shotgun debugging can be defined as:

In computer science, program analysis is the process of automatically analyzing the behavior of computer programs regarding a property such as correctness, robustness, safety and liveness. Program analysis focuses on two major areas: program optimization and program correctness. The first focuses on improving the program’s performance while reducing the resource usage while the latter focuses on ensuring that the program does what it is supposed to do.

<span class="mw-page-title-main">Crash (computing)</span> When a computer program stops functioning properly and self-terminates

In computing, a crash, or system crash, occurs when a computer program such as a software application or an operating system stops functioning properly and exits. On some operating systems or individual applications, a crash reporting service will report the crash and any details relating to it, usually to the developer(s) of the application. If the program is a critical part of the operating system, the entire system may crash or hang, often resulting in a kernel panic or fatal system error.

A programming tool or software development tool is a computer program that software developers use to create, debug, maintain, or otherwise support other programs and applications. The term usually refers to relatively simple programs, that can be combined to accomplish a task, much as one might use multiple hands to fix a physical object. The most basic tools are a source code editor and a compiler or interpreter, which are used ubiquitously and continuously. Other tools are used more or less depending on the language, development methodology, and individual engineer, often used for a discrete task, like a debugger or profiler. Tools may be discrete programs, executed separately – often from the command line – or may be parts of a single large program, called an integrated development environment (IDE). In many cases, particularly for simpler use, simple ad hoc techniques are used instead of a tool, such as print debugging instead of using a debugger, manual timing instead of a profiler, or tracking bugs in a text file or spreadsheet instead of a bug tracking system.

<span class="mw-page-title-main">Race condition</span> When a systems behavior depends on timing of uncontrollable events

A race condition or race hazard is the condition of an electronics, software, or other system where the system's substantive behavior is dependent on the sequence or timing of other uncontrollable events, leading to unexpected or inconsistent results. It becomes a bug when one or more of the possible behaviors is undesirable.

In computer science, symbolic execution (also symbolic evaluation or symbex) is a means of analyzing a program to determine what inputs cause each part of a program to execute. An interpreter follows the program, assuming symbolic values for inputs rather than obtaining actual inputs as normal execution of the program would. It thus arrives at expressions in terms of those symbols for expressions and variables in the program, and constraints in terms of those symbols for the possible outcomes of each conditional branch. Finally, the possible inputs that trigger a branch can be determined by solving the constraints.

In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks. Typically, fuzzers are used to test programs that take structured inputs. This structure is specified, e.g., in a file format or protocol and distinguishes valid from invalid input. An effective fuzzer generates semi-valid inputs that are "valid enough" in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are "invalid enough" to expose corner cases that have not been properly dealt with.

In software engineering, profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering.

An instruction set simulator (ISS) is a simulation model, usually coded in a high-level programming language, which mimics the behavior of a mainframe or microprocessor by "reading" instructions and maintaining internal variables which represent the processor's registers.

Dynamic program analysis is analysis of computer software that involves executing the program in question. Dynamic program analysis includes familiar techniques from software engineering such as unit testing, debugging, and measuring code coverage, but also includes lesser-known techniques like program slicing and invariant inference. Dynamic program analysis is widely applied in security in the form of runtime memory error detection, fuzzing, dynamic symbolic execution, and taint tracking.

In software engineering, software aging is the tendency for software to fail or cause a system failure after running continuously for a certain time, or because of ongoing changes in systems surrounding the software. Software aging has several causes, including the inability of old software to adapt to changing needs or changing technology platforms, and the tendency of software patches to introduce further errors. As the software gets older it becomes less well-suited to its purpose and will eventually stop functioning as it should. Rebooting or reinstalling the software can act as a short-term fix. A proactive fault management method to deal with the software aging incident is software rejuvenation. This method can be classified as an environment diversity technique that usually is implemented through software rejuvenation agents (SRA).

In computer programming and software development, debugging is the process of finding and resolving bugs within computer programs, software, or systems.

High performance computing applications run on massively parallel supercomputers consist of concurrent programs designed using multi-threaded, multi-process models. The applications may consist of various constructs with varying degree of parallelism. Although high performance concurrent programs use similar design patterns, models and principles as that of sequential programs, unlike sequential programs, they typically demonstrate non-deterministic behavior. The probability of bugs increases with the number of interactions between the various parallel constructs. Race conditions, data races, deadlocks, missed signals and live lock are common error types.

<span class="mw-page-title-main">American Fuzzy Lop (software)</span> Software fuzzer that employs genetic algorithms

American Fuzzy Lop (AFL), stylized in all lowercase as American fuzzy lop, is a free software fuzzer that employs genetic algorithms in order to efficiently increase code coverage of the test cases. So far it has detected dozens of significant software bugs in major free software projects, including X.Org Server, PHP, OpenSSL, pngcrush, bash, Firefox, BIND, Qt, and SQLite.

References

  1. "The Jargon File: heisenbug".
  2. 1 2 "The Jargon File: Mandelbug". Catb.org. Retrieved 2013-09-05.
  3. Raymond, Eric S.; The New Hacker's Dictionary, 3rd edition, 1996
  4. Clarke, Arthur C., The Ghost from the Grand Banks, Bantam Books, 1990
  5. 1 2 "The Jargon File: Schroedinbug". Catb.org. Retrieved 2013-09-05.
  6. Raymond, Eric S.; The New Hacker's Dictionary, 3rd edition, 1996
  7. The following article investigates the various definitions of bohrbug, mandelbug and heisenbug proposed in the literature, as well as the statements made about the relationships between these fault types: Grottke, Michael; and Trivedi, Kishor S.; Software Faults, Software Aging and Software Rejuvenation, Journal of the Reliability Engineering Association of Japan, Vol. 27, No. 7, pp. 425–438, 2005.
  8. Grottke, Michael; and Trivedi, Kishor S.; Fighting Bugs: Remove, Retry, Replicate, and Rejuvenate, IEEE Computer vol. 40, no. 2 (February 2007), pp. 107–109
  9. "Java toString() override with initialization as a side effect" Archived 2014-12-30 at the Wayback Machine
  10. CATB.org, "phase of the moon"
  11. Goshgarian, Gary; Exploring Language, HarperCollins College Publishers, 1995
  12. "Such transient software failures have been given the whimsical name 'Heisenbug' because they disappear when reexamined. By contrast, 'Bohrbugs' are good solid bugs." (IEEE Computer Group News, Volume 24, Numbers 7–12, 1991)
  13. "Hinden Bug".[ better source needed ]
  14. "New Programming Jargon". 20 July 2012.
  15. "20 Hilarious Programming Jargon Phrases You Should Use When Talking to Engineers". Business Insider .
  16. Gray, Jim (1985). "Why Do Computers Stop And What Can Be Done About It?". Technical Report 85.7. Tandem Computers.
  17. (16 December 1986) RISKS DIGEST 4.30 - (23 December 1986) RISKS DIGEST 4.34, moderated by Peter G. Neumann
  18. ""A Conversation with Bruce Lindsay", ACM Queue vol. 2, no. 8 - November 2004". Queue.acm.org. Retrieved 2013-09-05.
  19. Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium on High-Level Debugging, Pacific Grove, California, March 20–23, 1983, Association for Computing Machinery, 1983, Google Books search:
    This is the Heisenberg Uncertainty Principle as applied to debugging (an instance of such a bug was called a "Heisenbug" by one participant.)
    Also cited in LeBlanc, Richard J.; Robbins, Arnold D.; Event-Driven Monitoring of Distributed Programs, in Proceedings of the IEEE 5th International Conference on Distributed Computing Systems (ICDCS), IEEE Computer Society, Computer Society Press, 1985, pp. 515-522 Google Books search:
    This the Heisenberg Uncertainty Principle as applied to Debugging, sometimes called the "Heisenbug" Principle [ACM83].
  20. P., Birman, Kenneth (2005). Reliable distributed systems : technologies, Web services, and applications. New York: Springer. ISBN   0387276017. OCLC   225378026.{{cite book}}: CS1 maint: multiple names: authors list (link)