Test data

Last updated

Test data plays a crucial role in software development by providing inputs that are used to verify the correctness, performance, and reliability of software systems. Test data encompasses various types, such as positive and negative scenarios, edge cases, and realistic user scenarios, and it aims to exercise different aspects of the software to uncover bugs and validate its behavior. By designing and executing test cases with appropriate test data, developers can identify and rectify defects, improve the quality of the software, and ensure it meets the specified requirements. Moreover, test data can also be used for regression testing to validate that new code changes or enhancements do not introduce any unintended side effects or break existing functionalities. Overall, the effective utilization of test data in software development significantly contributes to the production of reliable and robust software systems.

Contents

Background

Some data may be used in a confirmatory way, typically to verify that a given set of inputs to a given function produces some expected result. Other data may be used in order to challenge the ability of the program to respond to unusual, extreme, exceptional, or unexpected input. [1]

Test data may be produced in a focused or systematic way (as is typically the case in domain testing), or by using other, less-focused approaches (as is typically the case in high-volume randomized automated tests). Test data may be produced by the tester, or by a program or function that aids the tester. Test data may be recorded for reuse or used only once. Test data can be created manually, by using data generation tools (often based on randomness [2] ), or be retrieved from an existing production environment. The data set can consist of synthetic (fake) data, but preferably it consists of representative (real) data. [3]

Limitations

Due to privacy rules and regulations like GDPR, PCI and HIPAA it is not allowed to use privacy sensitive personal data for testing. [4] But anonymized (and preferably subsetted) production data may be used as representative data for test and development. [5] Programmers can also choose to generate mock data, but this comes with its own limitations. It is not always possible to produce enough fake or mock data for testing. [6]

AI-generated "synthetic data" can be another option to generate test data. AI-powered synthetic data generators learn the patterns and qualities of a sample database. Once the training of the AI algorithm has taken place, it can produce as much or as little test data as defined. AI-generated synthetic data needs additional privacy measures to prevent the algorithm from overfitting. Some commercially available synthetic data generators come with additional privacy and accuracy controls. The amount of data to be tested is determined or limited by considerations such as time, cost and quality. Time to produce, cost to produce and quality of the test data, and efficiency.

Domain testing

Domain testing is a family of test techniques that focus on the test data. This might include identifying common or critical inputs, representatives of a particular equivalence class model, values that might appear at the boundaries between one equivalence class and another, outrageous values that should be rejected by the program, combinations of inputs, or inputs that might drive the product towards a particular set of outputs. [7]

See also

Related Research Articles

<span class="mw-page-title-main">Acceptance testing</span> Test to determine if the requirements of a specification or contract are met

In engineering and its various subdisciplines, acceptance testing is a test conducted to determine if the requirements of a specification or contract are met. It may involve chemical tests, physical tests, or performance tests.

Software testing is the act of examining the artifacts and the behavior of the software under test by validation and verification. Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software implementation. Test techniques include, but are not necessarily limited to:

Black-box testing is a method of software testing that examines the functionality of an application without peering into its internal structures or workings. This method of test can be applied virtually to every level of software testing: unit, integration, system and acceptance. It is sometimes referred to as specification-based testing.

In software testing, test automation is the use of software separate from the software being tested to control the execution of tests and the comparison of actual outcomes with predicted outcomes. Test automation can automate some repetitive but necessary tasks in a formalized testing process already in place, or perform additional testing that would be difficult to do manually. Test automation is critical for continuous delivery and continuous testing.

Data security means protecting digital data, such as those in a database, from destructive forces and from the unwanted actions of unauthorized users, such as a cyberattack or a data breach.

In software engineering, a test case is a specification of the inputs, execution conditions, testing procedure, and expected results that define a single test to be executed to achieve a particular software testing objective, such as to exercise a particular program path or to verify compliance with a specific requirement. Test cases underlie testing that is methodical rather than haphazard. A battery of test cases can be built to produce the desired coverage of the software being tested. Formally defined test cases allow the same tests to be run repeatedly against successive versions of the software, allowing for effective and consistent regression testing.

<span class="mw-page-title-main">Equivalence partitioning</span>

Equivalence partitioning or equivalence class partitioning (ECP) is a software testing technique that divides the input data of a software unit into partitions of equivalent data from which test cases can be derived. In principle, test cases are designed to cover each partition at least once. This technique tries to define test cases that uncover classes of errors, thereby reducing the total number of test cases that must be developed. An advantage of this approach is reduction in the time required for testing software due to lesser number of test cases.

<span class="mw-page-title-main">Fuzzing</span> Automated software testing technique

In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks. Typically, fuzzers are used to test programs that take structured inputs. This structure is specified, e.g., in a file format or protocol and distinguishes valid from invalid input. An effective fuzzer generates semi-valid inputs that are "valid enough" in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are "invalid enough" to expose corner cases that have not been properly dealt with.

Functional testing is a quality assurance (QA) process and a type of black-box testing that bases its test cases on the specifications of the software component under test. Functions are tested by feeding them input and examining the output, and internal program structure is rarely considered. Functional testing is conducted to evaluate the compliance of a system or component with specified functional requirements. Functional testing usually describes what the system does.

Performance engineering encompasses the techniques applied during a systems development life cycle to ensure the non-functional requirements for performance will be met. It may be alternatively referred to as systems performance engineering within systems engineering, and software performance engineering or application performance engineering within software engineering.

Keyword-driven testing, also known as action word based testing, is a software testing methodology suitable for both manual and automated testing. This method separates the documentation of test cases – including both the data and functionality to use – from the prescription of the way the test cases are executed. As a result, it separates the test creation process into two distinct stages: a design and development stage, and an execution stage. The design substage covers the requirement analysis and assessment and the data analysis, definition, and population.

In software engineering, graphical user interface testing is the process of testing a product's graphical user interface (GUI) to ensure it meets its specifications. This is normally done through the use of a variety of test cases.

<span class="mw-page-title-main">V-model (software development)</span> Software development methodology

In software development, the V-model represents a development process that may be considered an extension of the waterfall model, and is an example of the more general V-model. Instead of moving down in a linear way, the process steps are bent upwards after the coding phase, to form the typical V shape. The V-Model demonstrates the relationships between each phase of the development life cycle and its associated phase of testing. The horizontal and vertical axes represent time or project completeness (left-to-right) and level of abstraction, respectively.

A test strategy is an outline that describes the testing approach of the software development cycle. The purpose of a test strategy is to provide a rational deduction from organizational, high-level objectives to actual test activities to meet those objectives from a quality assurance perspective. The creation and documentation of a test strategy should be done in a systematic way to ensure that all objectives are fully covered and understood by all stakeholders. It should also frequently be reviewed, challenged and updated as the organization and the product evolve over time. Furthermore, a test strategy should also aim to align different stakeholders of quality assurance in terms of terminology, test and integration levels, roles and responsibilities, traceability, planning of resources, etc.

API testing is a type of software testing that involves testing application programming interfaces (APIs) directly and as part of integration testing to determine if they meet expectations for functionality, reliability, performance, and security. Since APIs lack a GUI, API testing is performed at the message layer. API testing is now considered critical for automating testing because APIs now serve as the primary interface to application logic and because GUI tests are difficult to maintain with the short release cycles and frequent changes commonly used with Agile software development and DevOps.

Synthetic data is information that's artificially generated rather than produced by real-world events. Typically created using algorithms, synthetic data can be deployed to validate mathematical models and to train machine learning models.

Database testing usually consists of a layered process, including the user interface (UI) layer, the business layer, the data access layer and the database itself. The UI layer deals with the interface design of the database, while the business layer includes databases supporting business strategies.

Shift-left testing refers to testing software early in the development process, while shift-right testing refers to testing towards the end of the development cycle. The term, "shift-left" is used to describe the idea of moving software testing earlier in the process, which can help catch defects earlier and reduce the cost of fixing them later. In contrast, "shift-right" refers to testing later in the process, which can help ensure that the software meets the intended requirements and functions correctly before it is released. SHift -left is also the first half of the maxim "test early and often", coined by Larry Smith in 2001.

This article discusses a set of tactics useful in software testing. It is intended as a comprehensive list of tactical approaches to Software Quality Assurance (more widely colloquially known as Quality Assurance and general application of the test method.

<span class="mw-page-title-main">Synthography</span> Method of generating media using machine learning

Synthography is the method of generating digital media synthetically using machine learning. This is distinct from other graphic creation and editing methods in that synthography uses artificial intelligence art text-to-image models to generate synthetic media. It is commonly achieved by prompt engineering text descriptions as input to create or edit a desired image.

References

  1. Weyuker, E. J. (1988-06-01). "The evaluation of program-based software test data adequacy criteria". Communications of the ACM. 31 (6): 668–675. doi:10.1145/62959.62963. ISSN   0001-0782. S2CID   15141475.
  2. "On testing in DDD". Medium. 2022-04-24. Retrieved 2023-01-24.
  3. "What is test data and how is it created?". DATPROF. 2019-06-26. Retrieved 2020-04-29.
  4. "Get GDPR, PCI and HIPAA compliant". DATPROF. 2020-03-03. Retrieved 2020-07-09.
  5. "Using production data for testing". DATPROF. 2019-10-17. Retrieved 2020-07-09.
  6. Emam, Khaled El; Mosquera, Lucy; Hoptroff, Richard (2020-05-19). Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data. "O'Reilly Media, Inc.". ISBN   978-1-4920-7271-3.
  7. Fries, Richard C. (2019-08-15). Handbook of Medical Device Design. CRC Press. ISBN   978-1-000-69695-0.