Random testing

Last updated January 27, 2025

Random testing is a black-box software testing technique where programs are tested by generating random, independent inputs. Results of the output are compared against software specifications to verify that the test output is pass or fail.^[1] In case of absence of specifications the exceptions of the language are used which means if an exception arises during test execution then it means there is a fault in the program, it is also used as a way to avoid biased testing.

History of random testing

Random testing for hardware was first examined by Melvin Breuer in 1971 and initial effort to evaluate its effectiveness was done by Pratima and Vishwani Agrawal in 1975.^[2]

In software, Duran and Ntafos had examined random testing in 1984.^[3]

The use of hypothesis testing as a theoretical basis for random testing was described by Howden in Functional Testing and Analysis. The book also contained the development of a simple formula for estimating the number of tests n that are needed to have confidence at least 1-1/n in a failure rate of no larger than 1/n. The formula is the lower bound nlogn, which indicates the large number of failure-free tests needed to have even modest confidence in a modest failure rate bound.^[4]

Overview

Consider the following C++ function:

intmyAbs(intx){if(x>0){returnx;}else{returnx;// bug: should be '-x'}}

Now the random tests for this function could be {123, 36, -35, 48, 0}. Only the value '-35' triggers the bug. If there is no reference implementation to check the result, the bug still could go unnoticed. However, an assertion could be added to check the results, like:

voidtestAbs(intn){for(inti=0;i<n;i++){intx=getRandomInput();intresult=myAbs(x);assert(result>=0);}}

The reference implementation is sometimes available, e.g. when implementing a simple algorithm in a much more complex way for better performance. For example, to test an implementation of the Schönhage–Strassen algorithm, the standard "*" operation on integers can be used:

intgetRandomInput(){// …}voidtestFastMultiplication(intn){for(inti=0;i<n;i++){longx=getRandomInput();longy=getRandomInput();longresult=fastMultiplication(x,y);assert(x*y==result);}}

While this example is limited to simple types (for which a simple random generator can be used), tools targeting object-oriented languages typically explore the program to test and find generators (constructors or methods returning objects of that type) and call them using random inputs (either themselves generated the same way or generated using a pseudo-random generator if possible). Such approaches then maintain a pool of randomly generated objects and use a probability for either reusing a generated object or creating a new one.^[5]

On randomness

According to the seminal paper on random testing by D. Hamlet

[..] the technical, mathematical meaning of "random testing" refers to an explicit lack of "system" in the choice of test data, so that there is no correlation among different tests.^[1]

Strengths and weaknesses

Random testing is praised for the following strengths:

It is cheap to use: it does not need to be smart about the program under test.
It does not have any bias: unlike manual testing, it does not overlook bugs because there is misplaced trust in some code.
It is quick to find bug candidates: it typically takes a couple of minutes to perform a testing session.
If software is properly specified: it finds real bugs.

The following weaknesses have been described :

It only finds basic bugs (e.g. null pointer dereferencing).
It is only as precise as the specification and specifications are typically imprecise.
It compares poorly with other techniques to find bugs (e.g. static program analysis).
If different inputs are randomly selected on each test run, this can create problems for continuous integration because the same tests will pass or fail randomly.^[6]
Some argue that it would be better to thoughtfully cover all relevant cases with manually constructed tests in a white-box fashion, than to rely on randomness.^[6]
It may require a very large number of tests for modest levels of confidence in modest failure rates. For example, it will require 459 failure-free tests to have at least 99% confidence that the probability of failure is less than 1/100.^[4]

Types of random testing

With respect to the input

Random input sequence generation (i.e. a sequence of method calls)
Random sequence of data inputs (sometimes called stochastic testing) - e.g. a random sequence of method calls
Random data selection from existing database

Guided vs. unguided

undirected random test generation - with no heuristics to guide its search
directed random test generation - e.g. "feedback-directed random test generation"^[7] and "adaptive random testing" ^[8]

Implementations

Some tools implementing random testing:

QuickCheck - a famous test tool, originally developed for Haskell but ported to many other languages, that generates random sequences of API calls based on a model and verifies system properties that should hold true after each run.
Randoop - generates sequences of methods and constructor invocations for the classes under test and creates JUnit tests from these
Simulant - a Clojure tool that runs simulations of various agents (e.g. users with different behavioral profiles) based on a statistical model of their behavior, recording all the actions and results into a database for later exploration and verification
AutoTest - a tool integrated to EiffelStudio testing automatically Eiffel code with contracts based on the eponymous research prototype.^[5]·
York Extensible Testing Infrastructure (YETI) - a language agnostic tool which targets various programming languages (Java, JML, CoFoJa, .NET, C, Kermeta).
GramTest - a grammar based random testing tool written in Java, it uses BNF notation to specify input grammars.

Critique

Random testing has only a specialized niche in practice, mostly because an effective oracle is seldom available, but also because of difficulties with the operational profile and with generation of pseudorandom input values.^[1]

A test oracle is an instrument for verifying whether the outcomes match the program specification or not. An operation profile is knowledge about usage patterns of the program and thus which parts are more important.

For programming languages and platforms which have contracts (e.g. Eiffel. .NET or various extensions of Java like JML, CoFoJa...) contracts act as natural oracles and the approach has been applied successfully.^[5] In particular, random testing finds more bugs than manual inspections or user reports (albeit different ones).^[9]

Related Research Articles

In software engineering, code coverage, also called test coverage, is a percentage measure of the degree to which the source code of a program is executed when a particular test suite is run. A program with high code coverage has more of its source code executed during testing, which suggests it has a lower chance of containing undetected software bugs compared to a program with low code coverage. Many different metrics can be used to calculate test coverage. Some of the most basic are the percentage of program subroutines and the percentage of program statements called during execution of the test suite.

<span class="mw-page-title-main">Software testing</span> Checking software against a standard

Software testing is the act of checking whether software satisfies expectations.

Generic programming is a style of computer programming in which algorithms are written in terms of data types to-be-specified-later that are then instantiated when needed for specific types provided as parameters. This approach, pioneered in the programming language ML in 1973, permits writing common functions or data types that differ only in the set of types on which they operate when used, thus reducing duplicate code.

Unit testing, a.k.a. component or module testing, is a form of software testing by which isolated source code is tested to validate expected behavior.

In the context of hardware and software systems, formal verification is the act of proving or disproving the correctness of a system with respect to a certain formal specification or property, using formal methods of mathematics. Formal verification is a key incentive for formal specification of systems, and is at the core of formal methods. It represents an important dimension of analysis and verification in electronic design automation and is one approach to software verification. The use of formal verification enables the highest Evaluation Assurance Level (EAL7) in the framework of common criteria for computer security certification.

In software project management, software testing, and software engineering, verification and validation is the process of checking that a software engineer system meets specifications and requirements so that it fulfills its intended purpose. It may also be referred to as software quality control. It is normally the responsibility of software testers as part of the software development lifecycle. In simple terms, software verification is: "Assuming we should build X, does our software achieve its goals without any bugs or gaps?" On the other hand, software validation is: "Was X what we should have built? Does X meet the high-level requirements?"

In computer programming, unreachable code is part of the source code of a program which can never be executed because there exists no control flow path to the code from the rest of the program.

In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks. Typically, fuzzers are used to test programs that take structured inputs. This structure is specified, such as in a file format or protocol and distinguishes valid from invalid input. An effective fuzzer generates semi-valid inputs that are "valid enough" in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are "invalid enough" to expose corner cases that have not been properly dealt with.

Mutation testing is used to design new software tests and evaluate the quality of existing software tests. Mutation testing involves modifying a program in small ways. Each mutated version is called a mutant and tests detect and reject mutants by causing the behaviour of the original version to differ from the mutant. This is called killing the mutant. Test suites are measured by the percentage of mutants that they kill. New tests can be designed to kill additional mutants. Mutants are based on well-defined mutation operators that either mimic typical programming errors or force the creation of valuable tests. The purpose is to help the tester develop effective tests or locate weaknesses in the test data used for the program or in sections of the code that are seldom or never accessed during execution. Mutation testing is a form of white-box testing.

SystemVerilog, standardized as IEEE 1800, is a hardware description and hardware verification language used to model, design, simulate, test and implement electronic systems. SystemVerilog is based on Verilog and some extensions, and since 2008, Verilog is now part of the same IEEE standard. It is commonly used in the semiconductor and electronic design industry as an evolution of Verilog.

Random number generation is a process by which, often by means of a random number generator (RNG), a sequence of numbers or symbols is generated that cannot be reasonably predicted better than by random chance. This means that the particular outcome sequence will contain some patterns detectable in hindsight but impossible to foresee. True random number generators can be hardware random-number generators (HRNGs), wherein each generation is a function of the current value of a physical environment's attribute that is constantly changing in a manner that is practically impossible to model. This would be in contrast to so-called "random number generations" done by pseudorandom number generators (PRNGs), which generate numbers that only look random but are in fact predetermined—these generations can be reproduced simply by knowing the state of the PRNG.

Runtime verification is a computing system analysis and execution approach based on extracting information from a running system and using it to detect and possibly react to observed behaviors satisfying or violating certain properties. Some very particular properties, such as datarace and deadlock freedom, are typically desired to be satisfied by all systems and may be best implemented algorithmically. Other properties can be more conveniently captured as formal specifications. Runtime verification specifications are typically expressed in trace predicate formalisms, such as finite-state machines, regular expressions, context-free patterns, linear temporal logics, etc., or extensions of these. This allows for a less ad-hoc approach than normal testing. However, any mechanism for monitoring an executing system is considered runtime verification, including verifying against test oracles and reference implementations. When formal requirements specifications are provided, monitors are synthesized from them and infused within the system by means of instrumentation. Runtime verification can be used for many purposes, such as security or safety policy monitoring, debugging, testing, verification, validation, profiling, fault protection, behavior modification, etc. Runtime verification avoids the complexity of traditional formal verification techniques, such as model checking and theorem proving, by analyzing only one or a few execution traces and by working directly with the actual system, thus scaling up relatively well and giving more confidence in the results of the analysis, at the expense of less coverage. Moreover, through its reflective capabilities runtime verification can be made an integral part of the target system, monitoring and guiding its execution during deployment.

The KeY tool is used in formal verification of Java programs. It accepts specifications written in the Java Modeling Language to Java source files. These are transformed into theorems of dynamic logic and then compared against program semantics that are likewise defined in terms of dynamic logic. KeY is significantly powerful in that it supports both interactive and fully automated correctness proofs. Failed proof attempts can be used for a more efficient debugging or verification-based testing. There have been several extensions to KeY in order to apply it to the verification of C programs or hybrid systems. KeY is jointly developed by Karlsruhe Institute of Technology, Germany; Technische Universität Darmstadt, Germany; and Chalmers University of Technology in Gothenburg, Sweden and is licensed under the GPL.

Intelligent Verification, including intelligent testbench automation, is a form of functional verification of electronic hardware designs used to verify that a design conforms to specification before device fabrication. Intelligent verification uses information derived from the design and specification(s) to expose bugs in and between hardware IPs. Intelligent verification tools require considerably less engineering effort and user guidance to achieve verification results that meet or exceed the standard approach of writing a testbench program.

In computing, compiler correctness is the branch of computer science that deals with trying to show that a compiler behaves according to its language specification. Techniques include developing the compiler using formal methods and using rigorous testing on an existing compiler.

Concolic testing is a hybrid software verification technique that performs symbolic execution, a classical technique that treats program variables as symbolic variables, along a concrete execution path. Symbolic execution is used in conjunction with an automated theorem prover or constraint solver based on constraint logic programming to generate new concrete inputs with the aim of maximizing code coverage. Its main focus is finding bugs in real-world software, rather than demonstrating program correctness.

The Classification Tree Method is a method for test design, as it is used in different areas of software development. It was developed by Grimm and Grochtmann in 1993. Classification Trees in terms of the Classification Tree Method must not be confused with decision trees.

Automatic bug-fixing is the automatic repair of software bugs without the intervention of a human programmer. It is also commonly referred to as automatic patch generation, automatic bug repair, or automatic program repair. The typical goal of such techniques is to automatically generate correct patches to eliminate bugs in software programs without causing software regression.

Differential testing, also known as differential fuzzing, is a software testing technique that detect bugs, by providing the same input to a series of similar applications, and observing differences in their execution. Differential testing complements traditional software testing because it is well-suited to find semantic or logic bugs that do not exhibit explicit erroneous behaviors like crashes or assertion failures. Differential testing is also called back-to-back testing.

Dafny is an imperative and functional compiled language that compiles to other programming languages, such as C#, Java, JavaScript, Go and Python. It supports formal specification through preconditions, postconditions, loop invariants, loop variants, termination specifications and read/write framing specifications. The language combines ideas from the functional and imperative paradigms; it includes support for object-oriented programming. Features include generic classes, dynamic allocation, inductive datatypes and a variation of separation logic known as implicit dynamic frames for reasoning about side effects. Dafny was created by Rustan Leino at Microsoft Research after his previous work on developing ESC/Modula-3, ESC/Java, and Spec#.

References

1 2 3 Richard Hamlet (1994). "Random Testing". In John J. Marciniak (ed.). Encyclopedia of Software Engineering (1st ed.). John Wiley and Sons. ISBN 978-0471540021.
↑ Agrawal, P.; Agrawal, V. D. (1 July 1975). "Probabilistic Analysis of Random Test Generation Method for Irredundant Combinational Logic Networks". IEEE Transactions on Computers. C-24 (7): 691–695. doi:10.1109/T-C.1975.224289.
↑ Duran, J. W.; Ntafos, S. C. (1 July 1984). "An Evaluation of Random Testing". IEEE Transactions on Software Engineering. SE-10 (4): 438–444. doi:10.1109/TSE.1984.5010257.
1 2 Howden, William (1987). Functional Program Testing and Analysis. New York: McGraw Hill. pp. 51–53. ISBN 0-07-030550-1.
1 2 3 "AutoTest - Chair of Software Engineering". se.inf.ethz.ch. Retrieved 15 November 2017.
1 2 "Is it a bad practice to randomly-generate test data?". stackoverflow.com. Retrieved 15 November 2017.
↑ Pacheco, Carlos; Shuvendu K. Lahiri; Michael D. Ernst; Thomas Ball (May 2007). "Feedback-directed random test generation" (PDF). ICSE '07: Proceedings of the 29th International Conference on Software Engineering: 75–84. doi:10.1109/ICSE.2007.37. ISBN 978-0-7695-2828-1. ISSN 0270-5257.
↑ T.Y. Chen; F.-C. Kuo; R.G. Merkel; T.H. Tse (2010), "Adaptive random testing: The ART of test case diversity", Journal of Systems and Software, 83 (1): 60–66, doi:10.1016/j.jss.2009.02.022, hdl: 10722/89054
↑ Ilinca Ciupa; Alexander Pretschner; Manuel Oriol; Andreas Leitner; Bertrand Meyer (2009). "On the number and nature of faults found by random testing". Software Testing, Verification and Reliability. 21: 3–28. doi:10.1002/stvr.415.

External links

Random testing by Andrea Arcuri.
Random testing by Richard Hamlet, professor emeritus at Portland State University; a valuable list of resources at the end of the paper
Random Testing wiki at Cunningham & Cunningham, Inc.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Hamlet94-1] 1 2 3 Richard Hamlet (1994). "Random Testing". In John J. Marciniak (ed.). Encyclopedia of Software Engineering (1st ed.). John Wiley and Sons. ISBN 978-0471540021.

[2] Agrawal, P.; Agrawal, V. D. (1 July 1975). "Probabilistic Analysis of Random Test Generation Method for Irredundant Combinational Logic Networks". IEEE Transactions on Computers. C-24 (7): 691–695. doi:10.1109/T-C.1975.224289.

[3] Duran, J. W.; Ntafos, S. C. (1 July 1984). "An Evaluation of Random Testing". IEEE Transactions on Software Engineering. SE-10 (4): 438–444. doi:10.1109/TSE.1984.5010257.

[:0-4] 1 2 Howden, William (1987). Functional Program Testing and Analysis. New York: McGraw Hill. pp. 51–53. ISBN 0-07-030550-1.

[AutoTest-5] 1 2 3 "AutoTest - Chair of Software Engineering". se.inf.ethz.ch. Retrieved 15 November 2017.

[so-6] 1 2 "Is it a bad practice to randomly-generate test data?". stackoverflow.com. Retrieved 15 November 2017.

[PachecoLET2007-7] Pacheco, Carlos; Shuvendu K. Lahiri; Michael D. Ernst; Thomas Ball (May 2007). "Feedback-directed random test generation" (PDF). ICSE '07: Proceedings of the 29th International Conference on Software Engineering: 75–84. doi:10.1109/ICSE.2007.37. ISBN 978-0-7695-2828-1. ISSN 0270-5257.

[ART-8] T.Y. Chen; F.-C. Kuo; R.G. Merkel; T.H. Tse (2010), "Adaptive random testing: The ART of test case diversity", Journal of Systems and Software, 83 (1): 60–66, doi:10.1016/j.jss.2009.02.022, hdl: 10722/89054

[ManualvsRandom-9] Ilinca Ciupa; Alexander Pretschner; Manuel Oriol; Andreas Leitner; Bertrand Meyer (2009). "On the number and nature of faults found by random testing". Software Testing, Verification and Reliability. 21: 3–28. doi:10.1002/stvr.415.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

v t e Software testing
The "box" approach	Black-box testing All-pairs testing Exploratory testing Fuzz testing Model-based testing Scenario testing Grey-box testing White-box testing API testing Mutation testing Static testing
Testing levels	Acceptance testing Integration testing System testing Unit testing
Testing types, techniques, and tactics	A/B testing Benchmark Compatibility testing Concolic testing Concurrent testing Conformance testing Continuous testing Destructive testing Development testing Differential testing Dynamic program analysis Installation testing Negative testing Random testing Regression testing Security testing Smoke testing (software) Software performance testing Stress testing Symbolic execution Test automation Usability testing
See also	Graphical user interface testing Manual testing Orthogonal array testing Pair testing Soak testing Software reliability testing Stress testing Web testing