Test data

Last updated September 08, 2024

Test data are sets of inputs or information used to verify the correctness, performance, and reliability of software systems. Test data encompass various types, such as positive and negative scenarios, edge cases, and realistic user scenarios, and aims to exercise different aspects of the software to uncover bugs and validate its behavior. Test data is also used in regression testing to verify that new code changes or enhancements do not introduce unintended side effects or break existing functionalities.^[1]

Background

Test data may be used to verify that a given set of inputs to a function produces an expected result. Alternatively, data can be used to challenge the program's ability to handle unusual, extreme, exceptional, or unexpected inputs.^[2]

Test data can be produced in a focused or systematic manner, as is typically the case in domain testing, or through less focused approaches, such as high-volume randomized automated tests.^[3] Test data can be generated by the tester or by a program or function that assists the tester. It can be recorded for reuse or used only once. Test data may be created manually, using data generation tools (often based on randomness),^[4] or retrieved from an existing production environment. The data set may consist of synthetic (fake) data, but ideally, it should include representative (real) data.^[5]

Limitations

Due to privacy regulations such as GDPR, PCI, and the HIPAA, the use of privacy-sensitive personal data for testing is restricted.^[6] However, anonymized (and preferably subsetted^{[ clarification needed ]}) production data may be used as representative data for testing and development.^[7] Programmers may also choose to generate synthetic data as an alternative to using real or anonymized data. While synthetic data can offer significant advantages, such as enhanced privacy and flexibility, it also comes with limitations. For instance, generating synthetic data that accurately reflects real-world complexity can be challenging. There is also a risk of synthetic data not fully capturing the nuances of real data, potentially leading to gaps in test coverage.^[8]

Domain testing

Domain testing is a set of techniques focusing on test data. This includes identifying critical inputs, values at the boundaries between equivalence classes, and combinations of inputs that drive the system toward specific outputs. Domain testing helps ensure that various scenarios are effectively tested, including edge cases and unusual conditions.^[9]

Related Research Articles

In engineering and its various subdisciplines, acceptance testing is a test conducted to determine if the requirements of a specification or contract are met. It may involve chemical tests, physical tests, or performance tests.

<span class="mw-page-title-main">Software testing</span> Checking software against a standard

Software testing is the act of checking whether software satisfies expectations.

Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal of extracting information from a data set and transforming the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

In software testing, test automation is the use of software separate from the software being tested to control the execution of tests and the comparison of actual outcomes with predicted outcomes. Test automation can automate some repetitive but necessary tasks in a formalized testing process already in place, or perform additional testing that would be difficult to do manually. Test automation is critical for continuous delivery and continuous testing.

Data security means protecting digital data, such as those in a database, from destructive forces and from the unwanted actions of unauthorized users, such as a cyberattack or a data breach.

In software engineering, a test case is a specification of the inputs, execution conditions, testing procedure, and expected results that define a single test to be executed to achieve a particular software testing objective, such as to exercise a particular program path or to verify compliance with a specific requirement. Test cases underlie testing that is methodical rather than haphazard. A battery of test cases can be built to produce the desired coverage of the software being tested. Formally defined test cases allow the same tests to be run repeatedly against successive versions of the software, allowing for effective and consistent regression testing.

In software engineering, graphical user interface testing is the process of testing a product's graphical user interface (GUI) to ensure it meets its specifications. This is normally done through the use of a variety of test cases.

In computer science, fault injection is a testing technique for understanding how computing systems behave when stressed in unusual ways. This can be achieved using physical- or software-based means, or using a hybrid approach. Widely studied physical fault injections include the application of high voltages, extreme temperatures and electromagnetic pulses on electronic components, such as computer memory and central processing units. By exposing components to conditions beyond their intended operating limits, computing systems can be coerced into mis-executing instructions and corrupting critical data.

Pseudonymization is a data management and de-identification procedure by which personally identifiable information fields within a data record are replaced by one or more artificial identifiers, or pseudonyms. A single pseudonym for each replaced field or collection of replaced fields makes the data record less identifiable while remaining suitable for data analysis and data processing.

Manual testing is the process of manually testing software for defects. It requires a tester to play the role of an end user where by they use most of the application's features to ensure correct behaviour. To guarantee completeness of testing, the tester often follows a written test plan that leads them through a set of important test cases.

Protected health information (PHI) under U.S. law is any information about health status, provision of health care, or payment for health care that is created or collected by a Covered Entity, and can be linked to a specific individual. This is interpreted rather broadly and includes any part of a patient's medical record or payment history.

Privacy-enhancing technologies (PET) are technologies that embody fundamental data protection principles by minimizing personal data use, maximizing data security, and empowering individuals. PETs allow online users to protect the privacy of their personally identifiable information (PII), which is often provided to and handled by services or applications. PETs use techniques to minimize an information system's possession of personal data without losing functionality. Generally speaking, PETs can be categorized as either hard or soft privacy technologies.

Data masking or data obfuscation is the process of modifying sensitive data in such a way that it is of no or little value to unauthorized intruders while still being usable by software or authorized personnel. Data masking can also be referred as anonymization, or tokenization, depending on different context.

Synthetic data is information that is artificially generated rather than produced by real-world events. Typically created using algorithms, synthetic data can be deployed to validate mathematical models and to train machine learning models.

Mobile application testing is a process by which application software developed for handheld mobile devices is tested for its functionality, usability and consistency. Mobile application testing can be an automated or manual type of testing. Mobile applications either come pre-installed or can be installed from mobile software distribution platforms. Global mobile app revenues totaled 69.7 billion USD in 2015, and are predicted to account for US$188.9 billion by 2020.

Informal methods of validation and verification are some of the more frequently used in modeling and simulation. They are called informal because they are more qualitative than quantitative. While many methods of validation or verification rely on numerical results, informal methods tend to rely on the opinions of experts to draw a conclusion. While numerical results are not the primary focus, this does not mean that the numerical results are completely ignored. There are several reasons why an informal method might be chosen. In some cases, informal methods offer the convenience of quick testing to see if a model can be validated. In other instances, informal methods are simply the best available option. Informal methods are not less effective than formal methods and should be performed with the same discipline and structure that one would expect in "formal" methods. When executed properly, solid conclusions can be made.

Data anonymization is a type of information sanitization whose intent is privacy protection. It is the process of removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous.

k-anonymity is a property possessed by certain anonymized data. The term k-anonymity was first introduced by Pierangela Samarati and Latanya Sweeney in a paper published in 1998, although the concept dates to a 1986 paper by Tore Dalenius.

Data re-identification or de-anonymization is the practice of matching anonymous data with publicly available information, or auxiliary data, in order to discover the person the data belong to. This is a concern because companies with privacy policies, health care providers, and financial institutions may release the data they collect after the data has gone through the de-identification process.

This article discusses a set of tactics useful in software testing. It is intended as a comprehensive list of tactical approaches to software quality assurance and general application of the test method.

References

↑ Shindar, Viday. "Software Testing: What is it and Why is it Important?". Software Testing Help. Retrieved 2024-08-07.
↑ Weyuker, E. J. (1988-06-01). "The evaluation of program-based software test data adequacy criteria". Communications of the ACM. 31 (6): 668–675. doi: 10.1145/62959.62963 . ISSN 0001-0782. S2CID 15141475.
↑ Beizer, Boris (1990-01-01). Software Testing Techniques. ITP Media Group. ISBN 978-1850328803.
↑ "On testing in DDD". Medium. 2022-04-24. Retrieved 2023-01-24.
↑ "What is test data and how is it created?". DATPROF. 2019-06-26. Retrieved 2020-04-29.
↑ "Get GDPR, PCI and HIPAA compliant". DATPROF. 2020-03-03. Retrieved 2020-07-09.
↑ "Using production data for testing". DATPROF. 2019-10-17. Retrieved 2020-07-09.
↑ El Emam, Khaled (2020-05-19). Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data. O'Reilly Media, Inc. ISBN 978-1492072744.
↑ Fries, Richard C. (2019-08-15). Handbook of Medical Device Design. CRC Press. ISBN 978-1-000-69695-0.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Shindar, Viday. "Software Testing: What is it and Why is it Important?". Software Testing Help. Retrieved 2024-08-07.

[2] Weyuker, E. J. (1988-06-01). "The evaluation of program-based software test data adequacy criteria". Communications of the ACM. 31 (6): 668–675. doi: 10.1145/62959.62963 . ISSN 0001-0782. S2CID 15141475.

[3] Beizer, Boris (1990-01-01). Software Testing Techniques. ITP Media Group. ISBN 978-1850328803.

[4] "On testing in DDD". Medium. 2022-04-24. Retrieved 2023-01-24.

[5] "What is test data and how is it created?". DATPROF. 2019-06-26. Retrieved 2020-04-29.

[6] "Get GDPR, PCI and HIPAA compliant". DATPROF. 2020-03-03. Retrieved 2020-07-09.

[7] "Using production data for testing". DATPROF. 2019-10-17. Retrieved 2020-07-09.

[8] El Emam, Khaled (2020-05-19). Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data. O'Reilly Media, Inc. ISBN 978-1492072744.

[9] Fries, Richard C. (2019-08-15). Handbook of Medical Device Design. CRC Press. ISBN 978-1-000-69695-0.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]