Test oracle

Last updated

In computing, software engineering, and software testing, given an input for a system a test oracle (or just oracle) is a procedure that distinguishes between the correct and incorrect behaviors of the System Under Test (SUT). [1] The use of oracles involves comparing the output(s) of the system under test, for a given test-case input, to the output(s) that the oracle determines that product should have. The term "test oracle" was first introduced in a paper by William E. Howden. [2] Additional work on different kinds of oracles was explored by Elaine Weyuker. [3]

Contents

Oracles often operate separately from the system under test. [4] However, method postconditions are part of the system under test, as automated oracles in design by contract models. [5] Determining the correct output for a given input (and a set of program or system states) is known as the oracle problem or test oracle problem, [6] :507 which is a much harder problem than it seems, and involves working with problems related to controllability and observability. [7]

Categories

A research literature survey covering 1978 to 2012 [6] found several potential categories of test oracles.

Specified

These oracles are typically associated with formalized approaches to software modeling and software code construction. They are connected to formal specification, [8] model-based design which may be used to generate test oracles, [9] state transition specification for which oracles can be derived to aid model-based testing [10] and protocol conformance testing, [11] and design by contract for which the equivalent test oracle is an assertion.

Specified Test Oracles have a number of challenges. Formal specification relies on abstraction, which in turn may naturally have an element of imprecision as all models cannot capture all behavior. [6] :514

Derived

A derived test oracle differentiates correct and incorrect behavior by using information derived from artifacts of the system. These may include documentation, system execution results and characteristics of versions of the system under test. [6] :514 Regression test suites (or reports) are an example of a derived test oracle - they are built on the assumption that the result from a previous system version can be used as aid (oracle) for a future system version. Previously measured performance characteristics may be used as an oracle for future system versions, for example, to trigger a question about observed potential performance degradation. Textual documentation from previous system versions may be used as a basis to guide expectations in future system versions.

A pseudo-oracle [6] :515 falls into the category of derived test oracle. A pseudo-oracle, as defined by Weyuker, [12] is a separately written program which can take the same input as the program or system under test so that their outputs may be compared to understand if there might be a problem to investigate.

A partial oracle [6] :515 is a hybrid between specified test oracle and derived test oracle. It specifies important (but not complete) properties of the system under test. For example, metamorphic testing exploits such properties, called metamorphic relations, across multiple executions of the system.

Implicit

An implicit test oracle relies on implied information and assumptions. [6] :518 For example, there may be some implied conclusion from a program crash, i.e. unwanted behavior - an oracle to determine that there may be a problem. There are a number of ways to search and test for unwanted behavior, whether some call it negative testing, where there are specialized subsets such as fuzzing.

There are limitations in implicit test oracles - as they rely on implied conclusions and assumptions. For example, a program or process crash may not be a priority issue if the system is a fault-tolerant system and so operating under a form of self-healing/self-management. Implicit test oracles may be susceptible to false positives due to environment dependencies. Property based testing relies on implicit oracles.

Human

When specified, derived or implicit test oracles cannot be used, then human input to determine the test oracles is required. [7] These can be thought of as quantitative and qualitative approaches. [6] :519–520 A quantitative approach aims to find the right amount of information to gather on a system under test (e.g., test results) for a stakeholder to be able to make decisions on fit-for-purpose or the release of the software. A qualitative approach aims to find the representativeness and suitability of the input test data and context of the output from the system under test. An example is using realistic and representative test data and making sense of the results (if they are realistic). These can be guided by heuristic approaches, such as gut instincts, rules of thumb, checklist aids, and experience to help tailor the specific combination selected for the program/system under test.

Examples

Test oracles are most commonly based on specifications and documentation. [13] [14] A formal specification used as input to model-based design and model-based testing would be an example of a specified test oracle. The model-based oracle uses the same model to generate and verify system behavior. [15] Documentation that is not a full specification of the product, such as a usage or installation guide, or a record of performance characteristics or minimum machine requirements for the software, would typically be a derived test oracle.

A consistency oracle compares the results of one test execution to another for similarity. [16] This is another example of a derived test oracle.

An oracle for a software program might be a second program that uses a different algorithm to evaluate the same mathematical expression as the product under test. This is an example of a pseudo-oracle, which is a derived test oracle. [12] :466

During Google search, we do not have a complete oracle to verify whether the number of returned results is correct. We may define a metamorphic relation [17] such that a follow-up narrowed-down search will produce fewer results. This is an example of a partial oracle, which is a hybrid between specified test oracle and derived test oracle.

A statistical oracle uses probabilistic characteristics, [18] for example with image analysis where a range of certainty and uncertainty is defined for the test oracle to pronounce a match or otherwise. This would be an example of a quantitative approach in human test oracle.

A heuristic oracle provides representative or approximate results over a class of test inputs. [19] This would be an example of a qualitative approach in human test oracle.

Related Research Articles

Software testing is the act of examining the artifacts and the behavior of the software under test by verification and validation. Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software implementation. Test techniques include, but are not limited to:

The waterfall model is a breakdown of development activities into linear sequential phases, meaning they are passed down onto each other, where each phase depends on the deliverables of the previous one and corresponds to a specialization of tasks. The approach is typical for certain areas of engineering design. In software development, it tends to be among the less iterative and flexible approaches, as progress flows in largely one direction through the phases of conception, initiation, analysis, design, construction, testing, deployment and maintenance. The waterfall model is the earliest SDLC approach that was used in software development.

<span class="mw-page-title-main">Software architecture</span> High level structures of a software system

Software architecture is the set of structures needed to reason about a software system and the discipline of creating such structures and systems. Each structure comprises software elements, relations among them, and properties of both elements and relations.

Software design is the process of conceptualizing how a software system will work before it is implemented or modified. Software design also refers to the direct result of the design process – the concepts of how the software will work which consists of both design documentation and undocumented concepts.

In the context of hardware and software systems, formal verification is the act of proving or disproving the correctness of a system with respect to a certain formal specification or property, using formal methods of mathematics. Formal verification is a key incentive for formal specification of systems, and is at the core of formal methods. It represents an important dimension of analysis and verification in electronic design automation and is one approach to software verification. The use of formal verification enables the highest Evaluation Assurance Level (EAL7) in the framework of common criteria for computer security certification.

In theoretical computer science, an algorithm is correct with respect to a specification if it behaves as specified. Best explored is functional correctness, which refers to the input-output behavior of the algorithm.

<span class="mw-page-title-main">Race condition</span> When a systems behavior depends on timing of uncontrollable events

A race condition or race hazard is the condition of an electronics, software, or other system where the system's substantive behavior is dependent on the sequence or timing of other uncontrollable events, leading to unexpected or inconsistent results. It becomes a bug when one or more of the possible behaviors is undesirable.

In product development and process optimization, a requirement is a singular documented physical or functional need that a particular design, product or process aims to satisfy. It is commonly used in a formal sense in engineering design, including for example in systems engineering, software engineering, or enterprise engineering. It is a broad concept that could speak to any necessary function, attribute, capability, characteristic, or quality of a system for it to have value and utility to a customer, organization, internal user, or other stakeholder. Requirements can come with different levels of specificity; for example, a requirement specification or requirement "spec" refers to an explicit, highly objective/clear requirement to be satisfied by a material, design, product, or service.

In software project management, software testing, and software engineering, verification and validation (V&V) is the process of checking that a software system meets specifications and requirements so that it fulfills its intended purpose. It may also be referred to as software quality control. It is normally the responsibility of software testers as part of the software development lifecycle. In simple terms, software verification is: "Assuming we should build X, does our software achieve its goals without any bugs or gaps?" On the other hand, software validation is: "Was X what we should have built? Does X meet the high-level requirements?"

A test plan is a document detailing the objectives, resources, and processes for a specific test session for a software or hardware product. The plan typically contains a detailed understanding of the eventual workflow.

In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks. Typically, fuzzers are used to test programs that take structured inputs. This structure is specified, e.g., in a file format or protocol and distinguishes valid from invalid input. An effective fuzzer generates semi-valid inputs that are "valid enough" in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are "invalid enough" to expose corner cases that have not been properly dealt with.

Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specified period of time. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.

Architecture description languages (ADLs) are used in several disciplines: system engineering, software engineering, and enterprise modelling and engineering.

In software engineering, behavior-driven development (BDD) is a software development process that encourages collaboration among developers, quality assurance experts, and customer representatives in a software project. It encourages teams to use conversation and concrete examples to formalize a shared understanding of how the application should behave. It emerged from test-driven development (TDD) and can work alongside an agile software development process. Behavior-driven development combines the general techniques and principles of TDD with ideas from domain-driven design and object-oriented analysis and design to provide software development and management teams with shared tools and a shared process to collaborate on software development.

<span class="mw-page-title-main">Functional specification</span>

A functional specification in systems engineering and software development is a document that specifies the functions that a system or component must perform.

In science, computing, and engineering, a black box is a system which can be viewed in terms of its inputs and outputs, without any knowledge of its internal workings. Its implementation is "opaque" (black). The term can be used to refer to many inner workings, such as those of a transistor, an engine, an algorithm, the human brain, or an institution or government.

Metamorphic testing (MT) is a property-based software testing technique, which can be an effective approach for addressing the test oracle problem and test case generation problem. The test oracle problem is the difficulty of determining the expected outcomes of selected test cases or to determine whether the actual outputs agree with the expected outcomes.

Random testing is a black-box software testing technique where programs are tested by generating random, independent inputs. Results of the output are compared against software specifications to verify that the test output is pass or fail. In case of absence of specifications the exceptions of the language are used which means if an exception arises during test execution then it means there is a fault in the program, it is also used as a way to avoid biased testing.

Inductive programming (IP) is a special area of automatic programming, covering research from artificial intelligence and programming, which addresses learning of typically declarative and often recursive programs from incomplete specifications, such as input/output examples or constraints.

Automatic bug-fixing is the automatic repair of software bugs without the intervention of a human programmer. It is also commonly referred to as automatic patch generation, automatic bug repair, or automatic program repair. The typical goal of such techniques is to automatically generate correct patches to eliminate bugs in software programs without causing software regression.

References

  1. Earl T. Barr et al; The Oracle Problem in Software Testing: A Survey , 2015
  2. Howden, W.E. (July 1978). "Theoretical and Empirical Studies of Program Testing". IEEE Transactions on Software Engineering. 4 (4): 293–298. doi:10.1109/TSE.1978.231514.
  3. Weyuker, Elaine J.; "The Oracle Assumption of Program Testing", in Proceedings of the 13th International Conference on System Sciences (ICSS), Honolulu, HI, January 1980, pp. 44-49
  4. Jalote, Pankaj; An Integrated Approach to Software Engineering, Springer/Birkhäuser, 2005, ISBN   0-387-20881-X
  5. Meyer, Bertrand; Fiva, Arno; Ciupa, Ilinca; Leitner, Andreas; Wei, Yi; Stapf, Emmanuel (September 2009). "Programs That Test Themselves". Computer. 42 (9): 46–55. doi:10.1109/MC.2009.296.
  6. 1 2 3 4 5 6 7 8 Barr, Earl T.; Harman, Mark; McMinn, Phil; Shahbaz, Muzammil; Yoo, Shin (November 2014). "The Oracle Problem in Software Testing: A Survey" (PDF). IEEE Transactions on Software Engineering. 41 (5): 507–525. doi: 10.1109/TSE.2014.2372785 .
  7. 1 2 Ammann, Paul; and Offutt, Jeff; "Introduction to Software Testing, 2nd edition", Cambridge University Press, 2016, ISBN   978-1107172012
  8. Börger, E (1999). "High Level System Design and Analysis Using Abstract State Machines". In Hutter, D; Stephan, W; Traverso, P; Ullman, M (eds.). Applied Formal Methods — FM-Trends 98. Lecture Notes in Computer Science. Vol. 1641. pp. 1–43. CiteSeerX   10.1.1.470.3653 . doi:10.1007/3-540-48257-1_1. ISBN   978-3-540-66462-8.
  9. Peters, D.K. (March 1998). "Using test oracles generated from program documentation". IEEE Transactions on Software Engineering. 24 (3): 161–173. CiteSeerX   10.1.1.39.2890 . doi:10.1109/32.667877.
  10. Utting, Mark; Pretschner, Alexander; Legeard, Bruno (2012). "A taxonomy of model-based testing approaches" (PDF). Software Testing, Verification and Reliability. 22 (5): 297–312. doi:10.1002/stvr.456. ISSN   1099-1689.
  11. Gaudel, Marie-Claude (2001). "Testing from Formal Specifications, a Generic Approach". In Craeynest, D.; Strohmeier, A (eds.). Reliable SoftwareTechnologies — Ada-Europe 2001. Lecture Notes in Computer Science. Vol. 2043. pp. 35–48. doi:10.1007/3-540-45136-6_3. ISBN   978-3-540-42123-8.
  12. 1 2 Weyuker, E.J. (November 1982). "On Testing Non-Testable Programs". The Computer Journal. 25 (4): 465–470. doi: 10.1093/comjnl/25.4.465 .
  13. Peters, Dennis K. (1995). Generating a Test Oracle from Program Documentation (M. Eng. thesis). McMaster University. CiteSeerX   10.1.1.69.4331 .
  14. Peters, Dennis K.; Parnas, David L. "Generating a Test Oracle from Program Documentation" (PDF). Proceedings of the 1994 International Symposium on Software Testing and Analysis. ISSTA. ACM Press. pp. 58–65.
  15. Robinson, Harry; Finite State Model-Based Testing on a Shoestring, STAR West 1999
  16. Hoffman, Douglas; Analysis of a Taxonomy for Test Oracles, Quality Week, 1998
  17. Zhou, Z.Q.; Zhang, S.; Hagenbuchner, M.; Tse, T.H.; Kuo, F.-C.; Chen, T.Y. (2012). "Automated functional testing of online search services". Software Testing, Verification and Reliability. 22 (4): 221–243. doi:10.1002/stvr.437. hdl: 10722/123864 .
  18. Mayer, Johannes; Guderlei, Ralph (2004). "Test Oracles Using Statistical Methods" (PDF). Proceedings of the First International Workshop on Software Quality, Lecture Notes in Informatics. First International Workshop on Software Quality. Springer. pp. 179–189.
  19. Hoffman, Douglas; Heuristic Test Oracles, Software Testing & Quality Engineering Magazine, 1999

Bibliography