Characterization test

Last updated July 09, 2020

In computer programming, a characterization test (also known as Golden Master Testing^[1]) is a means to describe (characterize) the actual behavior of an existing piece of software, and therefore protect existing behavior of legacy code against unintended changes via automated testing. This term was coined by Michael Feathers.^[2]

Overview

The goal of characterization tests is to help developers verify that the modifications made to a reference version of a software system did not modify its behavior in unwanted or undesirable ways. They enable, and provide a safety net for, extending and refactoring code that does not have adequate unit tests.

In James Bachs and Michael Boltons classification of test oracles,^[3] this kind of testing corresponds to the historical oracle. In contrast to the usual approach of assertions-based software testing, the outcome of the test is not determined by individual values or properties (that are checked with assertions), but by comparing a complex result of the tested software-process as a whole with the result of the same process in a previous version of the software. In a sense, characterization testing inverts traditional testing: Traditional tests check individual properties (whitelists them), where characterization testing checks all properties that are not removed (blacklisted).

When creating a characterization test, one must observe what outputs occur for a given set of inputs. Given an observation that the legacy code gives a certain output based on given inputs, then a test can be written that asserts that the output of the legacy code matches the observed result for the given inputs. For example, if one observes that f(3.14) == 42, then this could be created as a characterization test. Then, after modifications to the system, the test can determine if the modifications caused changes in the results when given the same inputs.

Unfortunately, as with any testing, it is generally not possible to create a characterization test for every possible input and output. As such, many people opt for either statement or branch coverage. However, even this can be difficult. Test writers must use their judgment to decide how much testing is appropriate. It is often sufficient to write characterization tests that only cover the specific inputs and outputs that are known to occur, paying special attention to edge cases.

Unlike regression tests, to which they are very similar, characterization tests do not verify the correct behavior of the code, which can be impossible to determine. Instead they verify the behavior that was observed when they were written. Often no specification or test suite is available, leaving only characterization tests as an option, since the conservative path is to assume that the old behavior is the required behavior. Characterization tests are, essentially, change detectors. It is up to the person analyzing the results to determine if the detected change was expected and/or desirable, or unexpected and/or undesirable.

One of the interesting aspects of characterization tests is that, since they are based on existing code, it's possible to generate some characterization tests automatically. An automated characterization test tool will exercise existing code with a wide range of relevant and/or random input values, record the output values (or state changes) and generate a set of characterization tests. When the generated tests are executed against a new version of the code, they will produce one or more failures/warnings if that version of the code has been modified in a way that changes a previously established behavior.

When testing on the GUI level, characterization testing can be combined with intelligent monkey testing to create complex test cases that capture use cases and special cases thereof.

Advantages

Golden Master testing has the following advantages over the traditional assertions-based software testing:

It is relatively easy to implement for complex legacy systems.
As such allows refactoring.
It is generally a sensible approach for complex results such as PDFs, XML, images, etc. where checking all relevant attributes with assertions would be both insensible due to the amount of attributes and result in unreadable/unmaintainable test code.

Disadvantages

Golden Master testing has the following disadvantages over traditional assertions-based software testing:

It depends on repeatability. Volatile and non-deterministic values need to be masked / removed, both from the Golden Master as well as from the result of the process. If too many elements need to be removed or removing them is too complex, it can render Golden Master testing impractical.
It depends not only on the software to be repeatable but also on the stability of the environment and input values.
Golden Master testing does not infer correctness of the results. It merely helps detect unwanted effects of software changes.

Related Research Articles

In computer programming and software design, code refactoring is the process of restructuring existing computer code—changing the factoring—without changing its external behavior. Refactoring is intended to improve the design, structure, and/or implementation of the software, while preserving its functionality. Potential advantages of refactoring may include improved code readability and reduced complexity; these can improve the source code's maintainability and create a simpler, cleaner, or more expressive internal architecture or object model to improve extensibility. Another potential goal for refactoring is improved performance; software engineers face an ongoing challenge to write programs that perform faster or use less memory.

Software testing is an investigation conducted to provide stakeholders with information about the quality of the software product or service under test. Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software implementation. Test techniques include the process of executing a program or application with the intent of finding software bugs, and verifying that the software product is fit for use.

In computer engineering, a hardware description language (HDL) is a specialized computer language used to describe the structure and behavior of electronic circuits, and most commonly, digital logic circuits.

In computer programming, unit testing is a software testing method by which individual units of source code, sets of one or more computer program modules together with associated control data, usage procedures, and operating procedures, are tested to determine whether they are fit for use.

Test-driven development (TDD) is a software development process that relies on the repetition of a very short development cycle: requirements are turned into very specific test cases, then the code is improved so that the tests pass. This is opposed to software development that allows code to be added that is not proven to meet requirements.

Formal equivalence checking process is a part of electronic design automation (EDA), commonly used during the development of digital integrated circuits, to formally prove that two representations of a circuit design exhibit exactly the same behavior.

In software testing, test automation is the use of software separate from the software being tested to control the execution of tests and the comparison of actual outcomes with predicted outcomes. Test automation can automate some repetitive but necessary tasks in a formalized testing process already in place, or perform additional testing that would be difficult to do manually. Test automation is critical for continuous delivery and continuous testing.

Lean software development is a translation of lean manufacturing principles and practices to the software development domain. Adapted from the Toyota Production System, it is emerging with the support of a pro-lean subculture within the Agile community. Lean offers a solid conceptual framework, values and principles, as well as good practices, derived from experience, that support agile organizations.

In object-oriented programming, mock objects are simulated objects that mimic the behavior of real objects in controlled ways, most often as part of a software testing initiative. A programmer typically creates a mock object to test the behavior of some other object, in much the same way that a car designer uses a crash test dummy to simulate the dynamic behavior of a human in vehicle impacts. The technique is also applicable in generic programming.

Runtime verification is a computing system analysis and execution approach based on extracting information from a running system and using it to detect and possibly react to observed behaviors satisfying or violating certain properties. Some very particular properties, such as datarace and deadlock freedom, are typically desired to be satisfied by all systems and may be best implemented algorithmically. Other properties can be more conveniently captured as formal specifications. Runtime verification specifications are typically expressed in trace predicate formalisms, such as finite state machines, regular expressions, context-free patterns, linear temporal logics, etc., or extensions of these. This allows for a less ad-hoc approach than normal testing. However, any mechanism for monitoring an executing system is considered runtime verification, including verifying against test oracles and reference implementations. When formal requirements specifications are provided, monitors are synthesized from them and infused within the system by means of instrumentation. Runtime verification can be used for many purposes, such as security or safety policy monitoring, debugging, testing, verification, validation, profiling, fault protection, behavior modification, etc. Runtime verification avoids the complexity of traditional formal verification techniques, such as model checking and theorem proving, by analyzing only one or a few execution traces and by working directly with the actual system, thus scaling up relatively well and giving more confidence in the results of the analysis, at the expense of less coverage. Moreover, through its reflective capabilities runtime verification can be made an integral part of the target system, monitoring and guiding its execution during deployment.

In electronic design automation, functional verification is the task of verifying that the logic design conforms to specification. In everyday terms, functional verification attempts to answer the question "Does this proposed design do what is intended?" This is a complex task, and takes the majority of time and effort in most large electronic system design projects. Functional verification is a part of more encompassing design verification, which, besides functional verification, considers non-functional aspects like timing, layout and power.

In computer programming jargon, a heisenbug is a software bug that seems to disappear or alter its behavior when one attempts to study it. The term is a pun on the name of Werner Heisenberg, the physicist who first asserted the observer effect of quantum mechanics, which states that the act of observing a system inevitably alters its state. In electronics the traditional term is probe effect, where attaching a test probe to a device changes its behavior.

Dynamic program analysis is the analysis of computer software that is performed by executing programs on a real or virtual processor. For dynamic program analysis to be effective, the target program must be executed with sufficient test inputs to cover almost all possible outputs. Use of software testing measures such as code coverage helps ensure that an adequate slice of the program's set of possible behaviors has been observed. Also, care must be taken to minimize the effect that instrumentation has on the execution of the target program. Dynamic analysis is in contrast to static program analysis. Unit tests, integration tests, system tests and acceptance tests use dynamic testing.

Fault injection is a testing technique which aids in understanding how [virtual/real] system behaves when stressed in unusual ways. This technique is based on simulation's or experiment's result, thus it may be more valid compared to statistical methods.

In computing, computer performance is the amount of useful work accomplished by a computer system. Outside of specific contexts, computer performance is estimated in terms of accuracy, efficiency and speed of executing computer program instructions. When it comes to high computer performance, one or more of the following factors might be involved:

Test generation is the process of creating a set of test data or test cases for testing the adequacy of new or revised software applications. Test Generation is seen to be a complex problem and though a lot of solutions have come forth most of them are limited to toy programs. Test Generation is one aspect of software testing. Since testing is labor-intensive, accounting for nearly one third of the cost of the system development, the problem of generating quality test data quickly, efficiently and accurately is seen to be important.

Database testing usually consists of a layered process, including the user interface (UI) layer, the business layer, the data access layer and the database itself. The UI layer deals with the interface design of the database, while the business layer includes databases supporting business strategies.

Random testing is a black-box software testing technique where programs are tested by generating random, independent inputs. Results of the output are compared against software specifications to verify that the test output is pass or fail. In case of absence of specifications the exceptions of the language are used which means if an exception arises during test execution then it means there is a fault in the program, it is also used as way to avoid biased testing.

Extreme programming (XP) is a software development methodology which is intended to improve software quality and responsiveness to changing customer requirements. As a type of agile software development, it advocates frequent "releases" in short development cycles, which is intended to improve productivity and introduce checkpoints at which new customer requirements can be adopted.

This article discusses a set of tactics useful in software testing. It is intended as a comprehensive list of tactical approaches to Software Quality Assurance (more widely colloquially known as Quality Assurance and general application of the test method.

References

↑ "J. B. Rainsberger - Surviving Legacy Code with Golden Master and Sampling" . Retrieved 2017-05-30.
↑ Feathers, Michael C. Working Effectively with Legacy Code ( ISBN 0-13-117705-2).
↑ Bolton, Michael (January 2005). "Testing Without a Map" (PDF). Better Software. Sticky Minds / TechWell. Retrieved 2017-05-30.

External links

Characterization Tests
Working Effectively With Characterization Tests first in a blog-based series of tutorials on characterization tests.
Change Code Without Fear DDJ article on characterization tests.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "J. B. Rainsberger - Surviving Legacy Code with Golden Master and Sampling" . Retrieved 2017-05-30.

[2] Feathers, Michael C. Working Effectively with Legacy Code ( ISBN 0-13-117705-2).

[3] Bolton, Michael (January 2005). "Testing Without a Map" (PDF). Better Software. Sticky Minds / TechWell. Retrieved 2017-05-30.