Software regression

Last updated

A software regression is a type of software bug where a feature that has worked before stops working. This may happen after changes are applied to the software's source code, including the addition of new features and bug fixes. [1] They may also be introduced by changes to the environment in which the software is running, such as system upgrades, system patching or a change to daylight saving time. [2] A software performance regression is a situation where the software still functions correctly, but performs more slowly or uses more memory or resources than before. [3] Various types of software regressions have been identified in practice, including the following: [4]

Contents

Regressions are often caused by encompassed bug fixes included in software patches. One approach to avoiding this kind of problem is regression testing. A properly designed test plan aims at preventing this possibility before releasing any software. [5] Automated testing and well-written test cases can reduce the likelihood of a regression.

Prevention and detection

Techniques have been proposed that try to prevent regressions from being introduced into software at various stages of development, outlined below.

Prior to release

In order to avoid regressions being seen by the end-user after release, developers regularly run regression tests after changes are introduced to the software. These tests can include unit tests to catch local regressions as well as integration tests to catch remote regressions. [6] Regression testing techniques often leverage existing test cases to minimize the effort involved in creating them. [7] However, due to the volume of these existing tests, it is often necessary to select a representative subset, using techniques such as test-case prioritization.

For detecting performance regressions, software performance tests are run on a regular basis, to monitor the response time and resource usage metrics of the software after subsequent changes. [8] Unlike functional regression tests, the results of performance tests are subject to variance - that is, results can differ between tests due to variance in performance measurements; as a result, a decision must be made on whether a change in performance numbers constitutes a regression, based on experience and end-user demands. Approaches such as statistical significance testing and change point detection are sometimes used to aid in this decision. [9]

Prior to commit

Since debugging and localizing the root cause of a software regression can be expensive, [10] [11] there also exist some methods that try to prevent regressions from being committed into the code repository in the first place. For example, Git Hooks enable developers to run test scripts before code changes are committed or pushed to the code repository. [12] In addition, change impact analysis has been applied to software to predict the impact of a code change on various components of the program, and to supplement test case selection and prioritization. [13] [14] Software linters are also often added to commit hooks to ensure consistent coding style, thereby minimizing stylistic issues that can make the software prone to regressions. [15]

Localization

Many of the techniques used to find the root cause of non-regression software bugs can also be used to debug software regressions, including breakpoint debugging, print debugging, and program slicing. The techniques described below are often used specifically to debug software regressions.

Functional regressions

A common technique used to localize functional regressions is bisection, which takes both a buggy commit and a previously working commit as input, and tries to find the root cause by doing a binary search on the commits in between. [16] Version control systems such as Git and Mercurial provide built-in ways to perform bisection on a given pair of commits. [17] [18]

Other options include directly associating the result of a regression test with code changes; [19] setting divergence breakpoints; [20] or using incremental data-flow analysis, which identifies test cases - including failing ones - that are relevant to a set of code changes, [21] among others.

Performance regressions

Profiling measures the performance and resource usage of various components of a program, and is used to generate data useful in debugging performance issues. In the context of software performance regressions, developers often compare the call trees (also known as "timelines") generated by profilers for both the buggy version and the previously working version, and mechanisms exist to simplify this comparison. [22] Web development tools typically provide developers the ability to record these performance profiles. [23] [24]

Logging also helps with performance regression localization, and similar to call trees, developers can compare systematically-placed performance logs of multiple versions of the same software. [25] A tradeoff exists when adding these performance logs, as adding many logs can help developers pinpoint which portions of the software are regressing at smaller granularities, while adding only a few logs will also reduce overhead when executing the program. [26]

Additional approaches include writing performance-aware unit tests to help with localization, [27] and ranking subsystems based on performance counter deviations. [28] Bisection can also be repurposed for performance regressions by considering commits that perform below (or above) a certain baseline value as buggy, and taking either the left or the right side of the commits based on the results of this comparison.

See also

Related Research Articles

Software testing is the act of examining the artifacts and the behavior of the software under test by validation and verification. Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software implementation. Test techniques include, but are not necessarily limited to:

<span class="mw-page-title-main">Software bug</span> Error, flaw, failure, or fault in a computer program or system

A software bug is an error, flaw or fault in the design, development, or operation of computer software that causes it to produce an incorrect or unexpected result, or to behave in unintended ways. The process of finding and correcting bugs is termed "debugging" and often uses formal techniques or tools to pinpoint bugs. Since the 1950s, some computer systems have been designed to deter, detect or auto-correct various computer bugs during operations.

Code review is a software quality assurance activity in which one or several people check a program mainly by viewing and reading parts of its source code, and they do so after implementation or as an interruption of implementation. At least one of the persons must not be the code's author. The persons performing the checking, excluding the author, are called "reviewers".

<span class="mw-page-title-main">Fuzzing</span> Automated software testing technique

In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks. Typically, fuzzers are used to test programs that take structured inputs. This structure is specified, e.g., in a file format or protocol and distinguishes valid from invalid input. An effective fuzzer generates semi-valid inputs that are "valid enough" in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are "invalid enough" to expose corner cases that have not been properly dealt with.

<span class="mw-page-title-main">Continuous integration</span> Software development practice based on frequent submission of granular changes

In software engineering, continuous integration (CI) is the practice of merging all developers' working copies to a shared mainline several times a day. Nowadays it is typically implemented in such a way that it triggers an automated build with testing. Grady Booch first proposed the term CI in his 1991 method, although he did not advocate integrating several times a day. Extreme programming (XP) adopted the concept of CI and did advocate integrating more than once per day – perhaps as many as tens of times per day.

Software visualization or software visualisation refers to the visualization of information of and related to software systems—either the architecture of its source code or metrics of their runtime behavior—and their development process by means of static, interactive or animated 2-D or 3-D visual representations of their structure, execution, behavior, and evolution.

Search-based software engineering (SBSE) applies metaheuristic search techniques such as genetic algorithms, simulated annealing and tabu search to software engineering problems. Many activities in software engineering can be stated as optimization problems. Optimization techniques of operations research such as linear programming or dynamic programming are often impractical for large scale software engineering problems because of their computational complexity or their assumptions on the problem structure. Researchers and practitioners use metaheuristic search techniques, which impose little assumptions on the problem structure, to find near-optimal or "good-enough" solutions.

In software engineering, software aging is the tendency for software to fail or cause a system failure after running continuously for a certain time, or because of ongoing changes in systems surrounding the software. Software aging has several causes, including the inability of old software to adapt to changing needs or changing technology platforms, and the tendency of software patches to introduce further errors. As the software gets older it becomes less well-suited to its purpose and will eventually stop functioning as it should. Rebooting or reinstalling the software can act as a short-term fix. A proactive fault management method to deal with the software aging incident is software rejuvenation. This method can be classified as an environment diversity technique that usually is implemented through software rejuvenation agents (SRA).

In computer programming and software development, debugging is the process of finding and resolving bugs within computer programs, software, or systems.

DevOps is a methodology in the software development and IT industry. Used as a set of practices and tools, DevOps integrates and automates the work of software development (Dev) and IT operations (Ops) as a means for improving and shortening the systems development life cycle. DevOps is complementary to agile software development; several DevOps aspects came from the agile way of working.

Bisection is a method used in software development to identify change sets that result in a specific behavior change. It is mostly employed for finding the patch that introduced a bug. Another application area is finding the patch that indirectly fixed a bug.

Within software engineering, the mining software repositories (MSR) field analyzes the rich data available in software repositories, such as version control repositories, mailing list archives, bug tracking systems, issue tracking systems, etc. to uncover interesting and actionable information about software systems, projects and software engineering.

Delta Debugging is a methodology to automate the debugging of programs using a scientific approach of hypothesis-trial-result loop. This methodology was first developed by Andreas Zeller of the Saarland University in 1999.

Software diagnosis refers to concepts, techniques, and tools that allow for obtaining findings, conclusions, and evaluations about software systems and their implementation, composition, behaviour, and evolution. It serves as means to monitor, steer, observe and optimize software development, software maintenance, and software re-engineering in the sense of a business intelligence approach specific to software systems. It is generally based on the automatic extraction, analysis, and visualization of corresponding information sources of the software system. It can also be manually done and not automatic.

<span class="mw-page-title-main">American fuzzy lop (fuzzer)</span> Software fuzzer that employs genetic algorithms

American fuzzy lop (AFL), stylized in lowercase as american fuzzy lop, is a free software fuzzer that employs genetic algorithms in order to efficiently increase code coverage of the test cases. So far it has detected dozens of significant software bugs in major free software projects, including X.Org Server, PHP, OpenSSL, pngcrush, bash, Firefox, BIND, Qt, and SQLite.

EvoSuite is a tool that automatically generates unit tests for Java software. EvoSuite uses an evolutionary algorithm to generate JUnit tests. EvoSuite can be run from the command line, and it also has plugins to integrate it in Maven, IntelliJ and Eclipse. EvoSuite has been used on more than a hundred open-source software and several industrial systems, finding thousands of potential bugs.

Automatic bug-fixing is the automatic repair of software bugs without the intervention of a human programmer. It is also commonly referred to as automatic patch generation, automatic bug repair, or automatic program repair. The typical goal of such techniques is to automatically generate correct patches to eliminate bugs in software programs without causing software regression.

A software bot is a type of software agent in the service of software project management and software engineering. A software bot has an identity and potentially personified aspects in order to serve their stakeholders. Software bots often compose software services and provide an alternative user interface, which is sometimes, but not necessarily conversational.

Static application security testing (SAST) is used to secure software by reviewing the source code of the software to identify sources of vulnerabilities. Although the process of statically analyzing the source code has existed as long as computers have existed, the technique spread to security in the late 90s and the first public discussion of SQL injection in 1998 when Web applications integrated new technologies like JavaScript and Flash.

In computer science, a code property graph (CPG) is a computer program representation that captures syntactic structure, control flow, and data dependencies in a property graph. The concept was originally introduced to identify security vulnerabilities in C and C++ system code, but has since been employed to analyze web applications, cloud deployments, and smart contracts. Beyond vulnerability discovery, code property graphs find applications in code clone detection, attack-surface detection, exploit generation, measuring code testability, and backporting of security patches.

References

  1. Wong, W. Eric; Horgan, J.R.; London, Saul; Agrawal, Hira (1997). "A Study of Effective Regression Testing in Practice". Proceedings of the Eighth International Symposium on Software Reliability Engineering (ISSRE 97). IEEE. doi:10.1109/ISSRE.1997.630875. ISBN   0-8186-8120-9. S2CID   2911517.
  2. Yehudai, Amiram; Tyszberowicz, Shmuel; Nir, Dor (2007). Locating Regression Bugs. Haifa Verification Conference. doi:10.1007/978-3-540-77966-7_18 . Retrieved 10 March 2018.
  3. Shang, Weiyi; Hassan, Ahmed E.; Nasser, Mohamed; Flora, Parminder (11 December 2014). "Automated Detection of Performance Regressions Using Regression Models on Clustered Performance Counters" (PDF).{{cite journal}}: Cite journal requires |journal= (help)
  4. Henry, Jean-Jacques Pierre (2008). The Testing Network: An Integral Approach to Test Activities in Large Software Projects. Springer Science & Business Media. p. 74. ISBN   978-3540785040.
  5. Richardson, Jared; Gwaltney, William Jr (2006). Ship It! A Practical Guide to Successful Software Projects. Raleigh, NC: The Pragmatic Bookshelf. pp.  32, 193. ISBN   978-0-9745140-4-8.
  6. Leung, Hareton K.N.; White, Lee (November 1990). "A study of integration testing and software regression at the integration level". Proceedings of the International Conference on Software Maintenance. San Diego, CA, USA: IEEE. doi:10.1109/ICSM.1990.131377. ISBN   0-8186-2091-9. S2CID   62583582.
  7. Rothermel, Gregg; Harrold, Mary Jean; Dedhia, Jeinay (2000). "Regression test selection for C++ software". Software Testing, Verification and Reliability. 10 (2): 77–109. doi:10.1002/1099-1689(200006)10:2<77::AID-STVR197>3.0.CO;2-E. ISSN   1099-1689.
  8. Weyuker, E.J.; Vokolos, F.I. (December 2000). "Experience with performance testing of software systems: issues, an approach, and case study". IEEE Transactions on Software Engineering. 26 (12): 1147–1156. doi:10.1109/32.888628. ISSN   1939-3520.
  9. Daly, David; Brown, William; Ingo, Henrik; O'Leary, Jim; Bradford, David (20 April 2020). "The Use of Change Point Detection to Identify Software Performance Regressions in a Continuous Integration System". Proceedings of the International Conference on Performance Engineering. Association for Computing Machinery. pp. 67–75. doi:10.1145/3358960.3375791. ISBN   978-1-4503-6991-6. S2CID   211677818.
  10. Nistor, Adrian; Jiang, Tian; Tan, Lin (May 2013). "Discovering, reporting, and fixing performance bugs". Proceedings of the Working Conference on Mining Software Repositories (MSR). pp. 237–246. doi:10.1109/MSR.2013.6624035. ISBN   978-1-4673-2936-1. S2CID   12773088.
  11. Agarwal, Pragya; Agrawal, Arun Prakash (17 September 2014). "Fault-localization techniques for software systems: a literature review". ACM SIGSOFT Software Engineering Notes. 39 (5): 1–8. doi:10.1145/2659118.2659125. ISSN   0163-5948. S2CID   12101263.
  12. "Git - Git Hooks". git-scm.com. Retrieved 7 November 2021.
  13. Orso, Alessandro; Apiwattanapong, Taweesup; Harrold, Mary Jean (1 September 2003). "Leveraging field data for impact analysis and regression testing". ACM SIGSOFT Software Engineering Notes. 28 (5): 128–137. doi:10.1145/949952.940089. ISSN   0163-5948.
  14. Qu, Xiao; Acharya, Mithun; Robinson, Brian (September 2012). "Configuration selection using code change impact analysis for regression testing". Proceedings of the International Conference on Software Maintenance. pp. 129–138. doi:10.1109/ICSM.2012.6405263. ISBN   978-1-4673-2312-3. S2CID   14928793.
  15. Tómasdóttir, Kristín Fjóla; Aniche, Mauricio; van Deursen, Arie (October 2017). "Why and how JavaScript developers use linters". Proceedings of the International Conference on Automated Software Engineering. pp. 578–589. doi:10.1109/ASE.2017.8115668. ISBN   978-1-5386-2684-9. S2CID   215750004.
  16. Gross, Thomas (10 September 1997). "Bisection Debugging". Proceedings of the International Workshop on Automatic Debugging. Linkøping University Electronic Press. pp. 185–191.
  17. "Git - git-bisect Documentation". git-scm.com. Retrieved 7 November 2021.
  18. "hg - bisect". www.selenic.com. Mercurial. Retrieved 7 November 2021.
  19. "Reading 11: Debugging". web.mit.edu. MIT.
  20. Buhse, Ben; Wei, Thomas; Zang, Zhiqiang; Milicevic, Aleksandar; Gligoric, Milos (May 2019). "VeDebug: Regression Debugging Tool for Java". Proceedings of the International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). pp. 15–18. doi:10.1109/ICSE-Companion.2019.00027. ISBN   978-1-7281-1764-5. S2CID   174799830.
  21. Taha, A.-B.; Thebaut, S.M.; Liu, S.-S. (September 1989). "An approach to software fault localization and revalidation based on incremental data flow analysis". Proceedings of the Annual International Computer Software & Applications Conference. IEEE. pp. 527–534. doi:10.1109/CMPSAC.1989.65142. ISBN   0-8186-1964-3. S2CID   41978046.
  22. Ocariza, Frolin S.; Zhao, Boyang (2021). "Localizing software performance regressions in web applications by comparing execution timelines". Software Testing, Verification and Reliability. 31 (5): e1750. doi:10.1002/stvr.1750. ISSN   1099-1689. S2CID   225416138.
  23. "Analyze runtime performance". Chrome Developers. Google. Retrieved 7 November 2021.
  24. "Performance analysis reference - Microsoft Edge Development". docs.microsoft.com. Microsoft. Retrieved 7 November 2021.
  25. Yao, Kundi; B. de Pádua, Guilherme; Shang, Weiyi; Sporea, Steve; Toma, Andrei; Sajedi, Sarah (30 March 2018). "Log4Perf: Suggesting Logging Locations for Web-based Systems' Performance Monitoring". Proceedings of the International Conference on Performance Engineering. Association for Computing Machinery. pp. 127–138. doi:10.1145/3184407.3184416. ISBN   978-1-4503-5095-2. S2CID   4557038.
  26. Li, Heng; Shang, Weiyi; Adams, Bram; Sayagh, Mohammed; Hassan, Ahmed E. (30 January 2020). "A Qualitative Study of the Benefits and Costs of Logging from Developers' Perspectives". IEEE Transactions on Software Engineering. 47 (12): 2858–2873. doi:10.1109/TSE.2020.2970422. S2CID   213679706.
  27. Heger, Christoph; Happe, Jens; Farahbod, Roozbeh (21 April 2013). "Automated root cause isolation of performance regressions during software development". Proceedings of the International Conference on Performance Engineering. Association for Computing Machinery. pp. 27–38. doi:10.1145/2479871.2479879. ISBN   978-1-4503-1636-1. S2CID   2593603.
  28. Malik, Haroon; Adams, Bram; Hassan, Ahmed E. (November 2010). "Pinpointing the Subsystems Responsible for the Performance Deviations in a Load Test". Proceedings of the International Symposium on Software Reliability Engineering. pp. 201–210. doi:10.1109/ISSRE.2010.43. ISBN   978-1-4244-9056-1. S2CID   17306870.