Search-based software engineering

Last updated

Search-based software engineering (SBSE) applies metaheuristic search techniques such as genetic algorithms, simulated annealing and tabu search to software engineering problems. Many activities in software engineering can be stated as optimization problems. Optimization techniques of operations research such as linear programming or dynamic programming are often impractical for large scale software engineering problems because of their computational complexity or their assumptions on the problem structure. Researchers and practitioners use metaheuristic search techniques, which impose little assumptions on the problem structure, to find near-optimal or "good-enough" solutions.

Contents

SBSE problems can be divided into two types:

Definition

SBSE converts a software engineering problem into a computational search problem that can be tackled with a metaheuristic. This involves defining a search space, or the set of possible solutions. This space is typically too large to be explored exhaustively, suggesting a metaheuristic approach. A metric [2] (also called a fitness function, cost function, objective function or quality measure) is then used to measure the quality of potential solutions. Many software engineering problems can be reformulated as a computational search problem. [3]

The term "search-based application", in contrast, refers to using search-engine technology, rather than search techniques, in another industrial application.

Brief history

One of the earliest attempts to apply optimization to a software engineering problem was reported by Webb Miller and David Spooner in 1976 in the area of software testing. [4] In 1992, S. Xanthakis and his colleagues applied a search technique to a software engineering problem for the first time. [5] The term SBSE was first used in 2001 by Harman and Jones. [6] The research community grew to include more than 800 authors by 2013, spanning approximately 270 institutions in 40 countries. [7]

Application areas

Search-based software engineering is applicable to almost all phases of the software development process. Software testing has been one of the major applications. [8] Search techniques have been applied to other software engineering activities, for instance, requirements analysis, [9] [10] design, [11] [12] refactoring, [13] development, [14] and maintenance. [15]

Requirements engineering

Requirements engineering is the process by which the needs of a software's users and environment are determined and managed. Search-based methods have been used for requirements selection and optimisation with the goal of finding the best possible subset of requirements that matches user requests amid constraints such as limited resources and interdependencies between requirements. This problem is often tackled as a multiple-criteria decision-making problem and, generally involves presenting the decision maker with a set of good compromises between cost and user satisfaction as well as the requirements risk. [16] [17] [18] [19]

Debugging and maintenance

Identifying a software bug (or a code smell) and then debugging (or refactoring) the software is largely a manual and labor-intensive endeavor, though the process is tool-supported. One objective of SBSE is to automatically identify and fix bugs (for example via mutation testing).

Genetic programming, a biologically-inspired technique that involves evolving programs through the use of crossover and mutation, has been used to search for repairs to programs by altering a few lines of source code. The GenProg Evolutionary Program Repair software repaired 55 out of 105 bugs for approximately $8 each in one test. [20]

Coevolution adopts a "predator and prey" metaphor in which a suite of programs and a suite of unit tests evolve together and influence each other. [21]

Testing

Search-based software engineering has been applied to software testing, including automatic generation of test cases (test data), test case minimization and test case prioritization. [22] Regression testing has also received some attention.

Optimizing software

The use of SBSE in program optimization, or modifying a piece of software to make it more efficient in terms of speed and resource use, has been the object of successful research. [23] In one instance, a 50,000 line program was genetically improved, resulting in a program 70 times faster on average. [24] A recent work by Basios et al. shows that by optimising the data structure, Google Guava found 9% improvement on execution time, 13% improvement on memory consumption and 4% improvement on CPU usage separately. [25]

Project management

A number of decisions that are normally made by a project manager can be done automatically, for example, project scheduling. [26]

Tools

Tools available for SBSE include OpenPAT, [27] EvoSuite, [28] and Coverage, a code coverage measurement tool for Python. [29]

Methods and techniques

A number of methods and techniques are available, including:

Industry acceptance

As a relatively new area of research, SBSE does not yet experience broad industry acceptance.

Successful applications of SBSE in the industry can mostly be found within software testing, where the capability to automatically generate random test inputs for uncovering bugs at a big scale is attractive to companies. In 2017, Facebook acquired the software startup Majicke Limited that developed Sapienz, a search-based bug finding app. [31]

In other application scenarios, software engineers may be reluctant to adopt tools over which they have little control or that generate solutions that are unlike those that humans produce. [32] In the context of SBSE use in fixing or improving programs, developers need to be confident that any automatically produced modification does not generate unexpected behavior outside the scope of a system's requirements and testing environment. Considering that fully automated programming has yet to be achieved, a desirable property of such modifications would be that they need to be easily understood by humans to support maintenance activities. [33]

Another concern is that SBSE might make the software engineer redundant. Supporters claim that the motivation for SBSE is to enhance the relationship between the engineer and the program. [34]

See also

Related Research Articles

<span class="mw-page-title-main">Genetic algorithm</span> Competitive algorithm for searching a problem space

In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on biologically inspired operators such as mutation, crossover and selection. Some examples of GA applications include optimizing decision trees for better performance, solving sudoku puzzles, hyperparameter optimization, causal inference, etc.

<span class="mw-page-title-main">Evolutionary algorithm</span> Subset of evolutionary computation

In computational intelligence (CI), an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the quality of the solutions. Evolution of the population then takes place after the repeated application of the above operators.

In computer science, evolutionary computation is a family of algorithms for global optimization inspired by biological evolution, and the subfield of artificial intelligence and soft computing studying these algorithms. In technical terms, they are a family of population-based trial and error problem solvers with a metaheuristic or stochastic optimization character.

<span class="mw-page-title-main">Particle swarm optimization</span> Iterative simulation method

In computational science, particle swarm optimization (PSO) is a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. It solves a problem by having a population of candidate solutions, here dubbed particles, and moving these particles around in the search-space according to simple mathematical formula over the particle's position and velocity. Each particle's movement is influenced by its local best known position, but is also guided toward the best known positions in the search-space, which are updated as better positions are found by other particles. This is expected to move the swarm toward the best solutions.

In genetic algorithms (GA), or more general, evolutionary algorithms (EA), a chromosome is a set of parameters which define a proposed solution of the problem that the evolutionary algorithm is trying to solve. The set of all solutions, also called individuals according to the biological model, is known as the population. The genome of an individual consists of one, more rarely of several, chromosomes and corresponds to the genetic representation of the task to be solved. A chromosome is composed of a set of genes, where a gene consists of one or more semantically connected parameters, which are often also called decision variables. They determine one or more phenotypic characteristics of the individual or at least have an influence on them. In the basic form of genetic algorithms, the chromosome is represented as a binary string, while in later variants and in EAs in general, a wide variety of other data structures are used.

In computer science and mathematical optimization, a metaheuristic is a higher-level procedure or heuristic designed to find, generate, tune, or select a heuristic that may provide a sufficiently good solution to an optimization problem or a machine learning problem, especially with incomplete or imperfect information or limited computation capacity. Metaheuristics sample a subset of solutions which is otherwise too large to be completely enumerated or otherwise explored. Metaheuristics may make relatively few assumptions about the optimization problem being solved and so may be usable for a variety of problems.

A memetic algorithm (MA) in computer science and operations research, is an extension of the traditional genetic algorithm (GA) or more general evolutionary algorithm (EA). It may provide a sufficiently good solution to an optimization problem. It uses a suitable heuristic or local search technique to improve the quality of solutions generated by the EA and to reduce the likelihood of premature convergence.

A hyper-heuristic is a heuristic search method that seeks to automate, often by the incorporation of machine learning techniques, the process of selecting, combining, generating or adapting several simpler heuristics to efficiently solve computational search problems. One of the motivations for studying hyper-heuristics is to build systems which can handle classes of problems rather than solving just one problem.

<span class="mw-page-title-main">Meta-optimization</span>

In numerical optimization, meta-optimization is the use of one optimization method to tune another optimization method. Meta-optimization is reported to have been used as early as in the late 1970s by Mercer and Sampson for finding optimal parameter settings of a genetic algorithm.

<span class="mw-page-title-main">Enrique Alba</span> Spanish computer science professor (born 1968)

Enrique Alba is a professor of computer science at the University of Málaga, Spain.

<span class="mw-page-title-main">HeuristicLab</span> Software environment

HeuristicLab is a software environment for heuristic and evolutionary algorithms, developed by members of the Heuristic and Evolutionary Algorithm Laboratory (HEAL) at the University of Applied Sciences Upper Austria, in Hagenberg im Mühlkreis. HeuristicLab has a strong focus on providing a graphical user interface so that users are not required to have comprehensive programming skills to adjust and extend the algorithms for a particular problem. In HeuristicLab algorithms are represented as operator graphs and changing or rearranging operators can be done by drag-and-drop without actually writing code. The software thereby tries to shift algorithm development capability from the software engineer to the user and practitioner. Developers can still extend the functionality on code level and can use HeuristicLab's plug-in mechanism that allows them to integrate custom algorithms, solution representations or optimization problems.

The MOEA Framework is an open-source evolutionary computation library for Java that specializes in multi-objective optimization. It supports a variety of multiobjective evolutionary algorithms (MOEAs), including genetic algorithms, genetic programming, grammatical evolution, differential evolution, and particle swarm optimization. As a result, it has been used to conduct numerous comparative studies to assess the efficiency, reliability, and controllability of state-of-the-art MOEAs.

<span class="mw-page-title-main">Symbolic regression</span> Type of regression analysis

Symbolic regression (SR) is a type of regression analysis that searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity.

EvoSuite is a tool that automatically generates unit tests for Java software. EvoSuite uses an evolutionary algorithm to generate JUnit tests. EvoSuite can be run from the command line, and it also has plugins to integrate it in Maven, IntelliJ and Eclipse. EvoSuite has been used on more than a hundred open-source software and several industrial systems, finding thousands of potential bugs.

In computer software development, genetic Improvement is the use of optimisation and machine learning techniques, particularly search-based software engineering techniques such as genetic programming to improve existing software. The improved program need not behave identically to the original. For example, automatic bug fixing improves program code by reducing or eliminating buggy behaviour. In other cases the improved software should behave identically to the old version but is better because, for example: it runs faster, it uses less memory, it uses less energy or it runs on a different type of computer. GI differs from, for example, formal program translation, in that it primarily verifies the behaviour of the new mutant version by running both the new and the old software on test inputs and comparing their output and performance in order to see if the new software can still do what is wanted of the original program and is now better.

Automatic bug-fixing is the automatic repair of software bugs without the intervention of a human programmer. It is also commonly referred to as automatic patch generation, automatic bug repair, or automatic program repair. The typical goal of such techniques is to automatically generate correct patches to eliminate bugs in software programs without causing software regression.

The Genetic and Evolutionary Computation Conference (GECCO) is the premier conference in the area of genetic and evolutionary computation. GECCO has been held every year since 1999, when it was first established as a recombination of the International Conference on Genetic Algorithms (ICGA) and the Annual Genetic Programming Conference (GP).

This is a chronological table of metaheuristic algorithms that only contains fundamental algorithms. Hybrid algorithms and multi-objective algorithms are not listed in the table below.

References

  1. Harman, Mark (2010). "Why Source Code Analysis and Manipulation Will Always be Important". 10th IEEE Working Conference on Source Code Analysis and Manipulation (SCAM 2010). 10th IEEE Working Conference on Source Code Analysis and Manipulation (SCAM 2010). pp. 7–19. doi:10.1109/SCAM.2010.28.
  2. Harman, Mark; John A. Clark (2004). "Metrics are fitness functions too". Proceedings of the 10th International Symposium on Software Metrics, 2004. 10th International Symposium on Software Metrics, 2004. pp. 58–69. doi:10.1109/METRIC.2004.1357891.
  3. Clark, John A.; Dolado, José Javier; Harman, Mark; Hierons, Robert M.; Jones, Bryan F.; Lumkin, M.; Mitchell, Brian S.; Mancoridis, Spiros; Rees, K.; Roper, Marc; Shepperd, Martin J. (2003). "Reformulating software engineering as a search problem". IEE Proceedings - Software . 150 (3): 161–175. CiteSeerX   10.1.1.144.3059 . doi:10.1049/ip-sen:20030559. ISSN   1462-5970.
  4. Miller, Webb; Spooner, David L. (1976). "Automatic Generation of Floating-Point Test Data". IEEE Transactions on Software Engineering. SE-2 (3): 223–226. doi:10.1109/TSE.1976.233818. ISSN   0098-5589. S2CID   18875300.
  5. S. Xanthakis, C. Ellis, C. Skourlas, A. Le Gall, S. Katsikas and K. Karapoulios, "Application of genetic algorithms to software testing," in Proceedings of the 5th International Conference on Software Engineering and its Applications, Toulouse, France, 1992, pp. 625–636
  6. Harman, Mark; Jones, Bryan F. (15 December 2001). "Search-based software engineering". Information and Software Technology. 43 (14): 833–839. CiteSeerX   10.1.1.143.9716 . doi:10.1016/S0950-5849(01)00189-6. ISSN   0950-5849.
  7. Harman, Mark; Mansouri, S. Afshin; Zhang, Yuanyuan (1 November 2012). "Search-based software engineering: Trends, techniques and applications". ACM Computing Surveys. 45 (1): 1–61. doi:10.1145/2379776.2379787. S2CID   207198163.
  8. McMinn, Phil (2004). "Search-based software test data generation: a survey". Software Testing, Verification and Reliability. 14 (2): 105–156. CiteSeerX   10.1.1.122.33 . doi:10.1002/stvr.294. ISSN   1099-1689. S2CID   17408871.
  9. Greer, Des; Ruhe, Guenther (15 March 2004). "Software release planning: an evolutionary and iterative approach". Information and Software Technology. 46 (4): 243–253. CiteSeerX   10.1.1.195.321 . doi:10.1016/j.infsof.2003.07.002. ISSN   0950-5849. S2CID   710923.
  10. Colares, Felipe; Souza, Jerffeson; Carmo, Raphael; Pádua, Clarindo; Mateus, Geraldo R. (2009). "A New Approach to the Software Release Planning". XXIII Brazilian Symposium on Software Engineering, 2009. SBES '09. XXIII Brazilian Symposium on Software Engineering, 2009. SBES '09. pp. 207–215. doi:10.1109/SBES.2009.23.
  11. Clark, John A.; Jacob, Jeremy L. (15 December 2001). "Protocols are programs too: the meta-heuristic search for security protocols". Information and Software Technology. 43 (14): 891–904. CiteSeerX   10.1.1.102.6016 . doi:10.1016/S0950-5849(01)00195-1. ISSN   0950-5849.
  12. Räihä, Outi (1 November 2010). "A survey on search-based software design" (PDF). Computer Science Review. 4 (4): 203–249. CiteSeerX   10.1.1.188.9036 . doi:10.1016/j.cosrev.2010.06.001. ISSN   1574-0137.
  13. Mariani, Thainá; Vergilio, Silvia Regina (1 March 2017). "A systematic review on search-based refactoring". Information and Software Technology. 83: 14–34. doi:10.1016/j.infsof.2016.11.009. ISSN   0950-5849.
  14. Alba, Enrique; Chicano, J. Francisco (1 June 2007). "Software project management with GAs". Information Sciences. 177 (11): 2380–2401. doi:10.1016/j.ins.2006.12.020. hdl: 10630/8145 . ISSN   0020-0255.
  15. Antoniol, Giuliano; Di Penta, Massimiliano; Harman, Mark (2005). "Search-based techniques applied to optimization of project planning for a massive maintenance project". Proceedings of the 21st IEEE International Conference on Software Maintenance, 2005. ICSM'05. Proceedings of the 21st IEEE International Conference on Software Maintenance, 2005. ICSM'05. pp. 240–249. CiteSeerX   10.1.1.63.8069 . doi:10.1109/ICSM.2005.79.
  16. Zhang, Yuanyuan (February 2010). Multi-Objective Search-based Requirements Selection and Optimisation (PhD). Strand, London, UK: University of London.
  17. Y. Zhang and M. Harman and S. L. Lim, "Search Based Optimization of Requirements Interaction Management," Department of Computer Science, University College London, Research Note RN/11/12, 2011.
  18. Li, Lingbo; Harman, Mark; Letier, Emmanuel; Zhang, Yuanyuan (2014). "Robust next release problem". Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation. Gecco '14. pp. 1247–1254. doi:10.1145/2576768.2598334. ISBN   9781450326629. S2CID   8423690.
  19. Li, L.; Harman, M.; Wu, F.; Zhang, Y. (2017). "The Value of Exact Analysis in Requirements Selection" (PDF). IEEE Transactions on Software Engineering. 43 (6): 580–596. doi:10.1109/TSE.2016.2615100. ISSN   0098-5589. S2CID   8398275.
  20. Le Goues, Claire; Dewey-Vogt, Michael; Forrest, Stephanie; Weimer, Westley (2012). "A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each". 2012 34th International Conference on Software Engineering (ICSE). 2012 34th International Conference on Software Engineering (ICSE). pp. 3–13. doi:10.1109/ICSE.2012.6227211.
  21. Arcuri, Andrea; Yao, Xin (2008). "A novel co-evolutionary approach to automatic software bug fixing". IEEE Congress on Evolutionary Computation, 2008. CEC 2008. (IEEE World Congress on Computational Intelligence). IEEE Congress on Evolutionary Computation, 2008. CEC 2008. (IEEE World Congress on Computational Intelligence). pp. 162–168. doi:10.1109/CEC.2008.4630793.
  22. Harman, Mark; Jia, Yue; Zhang, Yuanyuan (April 2015). "Achievements, Open Problems and Challenges for Search Based Software Testing". 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST). Graz, Austria: IEEE. pp. 1–12. CiteSeerX   10.1.1.686.7418 . doi:10.1109/ICST.2015.7102580. ISBN   978-1-4799-7125-1. S2CID   15272060.
  23. Memeti, Suejb; Pllana, Sabri; Binotto, Alecio; Kolodziej, Joanna; Brandic, Ivona (2018). "Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review". Computing. 101 (8): 893–936. arXiv: 1801.09444 . Bibcode:2018arXiv180109444M. doi:10.1007/s00607-018-0614-9. S2CID   13868111.
  24. Langdon, William B.; Harman, Mark. "Optimising Existing Software with Genetic Programming" (PDF). IEEE Transactions on Evolutionary Computation.
  25. Basios, Michail; Li, Lingbo; Wu, Fan; Kanthan, Leslie; Barr, Earl T. (9 September 2017). "Optimising Darwinian Data Structures on Google Guava". Search Based Software Engineering (PDF). Lecture Notes in Computer Science. Vol. 10452. pp. 161–167. doi:10.1007/978-3-319-66299-2_14. ISBN   978-3-319-66298-5.
  26. Minku, Leandro L.; Sudholt, Dirk; Yao, Xin (2012). "Evolutionary algorithms for the project scheduling problem: runtime analysis and improved design". Proceedings of the fourteenth international conference on Genetic and evolutionary computation conference. GECCO '12. New York, NY, USA: ACM. pp. 1221–1228. doi:10.1145/2330163.2330332. ISBN   978-1-4503-1177-9.
  27. Mayo, M.; Spacey, S. (2013). "Predicting Regression Test Failures Using Genetic Algorithm-Selected Dynamic Performance Analysis Metrics" (PDF). Search Based Software Engineering. Lecture Notes in Computer Science. Vol. 8084. pp. 158–171. doi:10.1007/978-3-642-39742-4_13. hdl: 10289/7763 . ISBN   978-3-642-39741-7.
  28. "Home". evosuite.org.
  29. others, Ned Batchelder and 100, coverage: Code coverage measurement for Python , retrieved 14 March 2018{{citation}}: CS1 maint: numeric names: authors list (link)
  30. "Open Source Profilers in Java".
  31. "Sapienz: Facebook's push to automate software testing". VentureBeat. 30 December 2018. Retrieved 29 September 2020.
  32. Jones, Derek (18 October 2013). "Programming using genetic algorithms: isn't that what humans already do ;-)". The Shape of Code. Retrieved 31 October 2013.
  33. Le Goues, Claire; Forrest, Stephanie; Weimer, Westley (1 September 2013). "Current challenges in automatic software repair". Software Quality Journal. 21 (3): 421–443. CiteSeerX   10.1.1.371.5784 . doi:10.1007/s11219-013-9208-0. ISSN   1573-1367. S2CID   16435531.
  34. Simons, Christopher L. (May 2013). Whither (away) software engineers in SBSE?. First International Workshop on Combining Modelling with Search-Based Software Engineering, First International Workshop on Combining Modelling with Search-Based Software Engineering. San Francisco, USA: IEEE Press. pp. 49–50. Retrieved 31 October 2013.