Empirical algorithmics

Last updated

In computer science, empirical algorithmics (or experimental algorithmics) is the practice of using empirical methods to study the behavior of algorithms. The practice combines algorithm development and experimentation: algorithms are not just designed, but also implemented and tested in a variety of situations. In this process, an initial design of an algorithm is analyzed so that the algorithm may be developed in a stepwise manner. [1]

Contents

Overview

Methods from empirical algorithmics complement theoretical methods for the analysis of algorithms. [2] Through the principled application of empirical methods, particularly from statistics, it is often possible to obtain insights into the behavior of algorithms such as high-performance heuristic algorithms for hard combinatorial problems that are (currently) inaccessible to theoretical analysis. [3] Empirical methods can also be used to achieve substantial improvements in algorithmic efficiency. [4]

American computer scientist Catherine McGeoch identifies two main branches of empirical algorithmics: the first (known as empirical analysis) deals with the analysis and characterization of the behavior of algorithms, and the second (known as algorithm design or algorithm engineering) is focused on empirical methods for improving the performance of algorithms. [5] The former often relies on techniques and tools from statistics, while the latter is based on approaches from statistics, machine learning and optimization. Dynamic analysis tools, typically performance profilers, are commonly used when applying empirical methods for the selection and refinement of algorithms of various types for use in various contexts. [6] [7] [8]

Research in empirical algorithmics is published in several journals, including the ACM Journal on Experimental Algorithmics (JEA) and the Journal of Artificial Intelligence Research (JAIR). Besides Catherine McGeoch, well-known researchers in empirical algorithmics include Bernard Moret, Giuseppe F. Italiano, Holger H. Hoos, David S. Johnson, and Roberto Battiti. [9]

Performance profiling in the design of complex algorithms

In the absence of empirical algorithmics, analyzing the complexity of an algorithm can involve various theoretical methods applicable to various situations in which the algorithm may be used. [10] Memory and cache considerations are often significant factors to be considered in the theoretical choice of a complex algorithm, or the approach to its optimization, for a given purpose. [11] [12] Performance profiling is a dynamic program analysis technique typically used for finding and analyzing bottlenecks in an entire application's code [13] [14] [15] or for analyzing an entire application to identify poorly performing code. [16] A profiler can reveal the code most relevant to an application's performance issues. [17]

A profiler may help to determine when to choose one algorithm over another in a particular situation. [18] When an individual algorithm is profiled, as with complexity analysis, memory and cache considerations are often more significant than instruction counts or clock cycles; however, the profiler's findings can be considered in light of how the algorithm accesses data rather than the number of instructions it uses. [19]

Profiling may provide intuitive insight into an algorithm's behavior [20] by revealing performance findings as a visual representation. [21] Performance profiling has been applied, for example, during the development of algorithms for matching wildcards. Early algorithms for matching wildcards, such as Rich Salz' wildmat algorithm, [22] typically relied on recursion, a technique criticized on grounds of performance. [23] The Krauss matching wildcards algorithm was developed based on an attempt to formulate a non-recursive alternative using test cases [24] followed by optimizations suggested via performance profiling, [25] resulting in a new algorithmic strategy conceived in light of the profiling along with other considerations. [26] Profilers that collect data at the level of basic blocks [27] or that rely on hardware assistance [28] provide results that can be accurate enough to assist software developers in optimizing algorithms for a particular computer or situation. [29] Performance profiling can aid developer understanding of the characteristics of complex algorithms applied in complex situations, such as coevolutionary algorithms applied to arbitrary test-based problems, and may help lead to design improvements. [30]

See also

Related Research Articles

<span class="mw-page-title-main">Computer science</span> Study of computation

Computer science is the study of computation, information, and automation. Computer science spans theoretical disciplines to applied disciplines. Though more often considered an academic discipline, computer science is closely related to computer programming.

Computer science is the study of the theoretical foundations of information and computation and their implementation and application in computer systems. One well known subject classification system for computer science is the ACM Computing Classification System devised by the Association for Computing Machinery.

In computing, just-in-time (JIT) compilation is compilation during execution of a program rather than before execution. This may consist of source code translation but is more commonly bytecode translation to machine code, which is then executed directly. A system implementing a JIT compiler typically continuously analyses the code being executed and identifies parts of the code where the speedup gained from compilation or recompilation would outweigh the overhead of compiling that code.

In computer science, program optimization, code optimization, or software optimization, is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be optimized so that it executes more rapidly, or to make it capable of operating with less memory storage or other resources, or draw less power.

Performance tuning is the improvement of system performance. Typically in computer systems, the motivation for such activity is called a performance problem, which can be either real or anticipated. Most systems will respond to increased load with some degree of decreasing performance. A system's ability to accept higher load is called scalability, and modifying a system to handle a higher load is synonymous to performance tuning.

<span class="mw-page-title-main">Theoretical computer science</span> Subfield of computer science and mathematics

Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory.

Computational Economics is an interdisciplinary research discipline that involves computer science, economics, and management science. This subject encompasses computational modeling of economic systems. Some of these areas are unique, while others established areas of economics by allowing robust data analytics and solutions of problems that would be arduous to research without computers and associated numerical methods.

In software engineering, profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering.

In the field of human–computer interaction, a Wizard of Oz experiment is a research experiment in which subjects interact with a computer system that subjects believe to be autonomous, but which is actually being operated or partially operated by an unseen human being.

The Center for Discrete Mathematics and Theoretical Computer Science (DIMACS) is a collaboration between Rutgers University, Princeton University, and the research firms AT&T, Bell Labs, Applied Communication Sciences, and NEC. It was founded in 1989 with money from the National Science Foundation. Its offices are located on the Rutgers campus, and 250 members from the six institutions form its permanent members.

<span class="mw-page-title-main">Douglas W. Jones</span> American computer scientist

Douglas W. Jones is an American computer scientist at the University of Iowa. His research focuses primarily on computer security, particularly electronic voting.

Algorithm engineering focuses on the design, analysis, implementation, optimization, profiling and experimental evaluation of computer algorithms, bridging the gap between algorithm theory and practical applications of algorithms in software engineering. It is a general methodology for algorithmic research.

In software engineering, a bottleneck occurs when the capacity of an application or a computer system is limited by a single component, like the neck of a bottle slowing down the overall water flow. The bottleneck has the lowest throughput of all parts of the transaction path.

In mathematical optimization and computer science, heuristic is a technique designed for solving a problem more quickly when classic methods are too slow for finding an approximate solution, or when classic methods fail to find any exact solution. This is achieved by trading optimality, completeness, accuracy, or precision for speed. In a way, it can be considered a shortcut.

Alistair Sinclair is a British computer scientist and computational theorist.

Daniel Mier Gusfield is an American computer scientist, Distinguished Professor of Computer Science at the University of California, Davis. Gusfield is known for his research in combinatorial optimization and computational biology.

In computer science, the Krauss wildcard-matching algorithm is a pattern matching algorithm. Based on the wildcard syntax in common use, e.g. in the Microsoft Windows command-line interface, the algorithm provides a non-recursive mechanism for matching patterns in software applications, based on syntax simpler than that typically offered by regular expressions.

In computer science, an algorithm for matching wildcards is useful in comparing text strings that may contain wildcard syntax. Common uses of these algorithms include command-line interfaces, e.g. the Bourne shell or Microsoft Windows command-line or text editor or file manager, as well as the interfaces for some search engines and databases. Wildcard matching is a subset of the problem of matching regular expressions and string matching in general.

References

  1. Fleischer, Rudolf; et al., eds. (2002). Experimental Algorithmics, From Algorithm Design to Robust and Efficient Software. Springer International Publishing AG.
  2. Moret, Bernard M. E. (1999). Towards A Discipline Of Experimental Algorithmics. DIMACS Series in Discrete Mathematics and Theoretical Computer Science. Vol. 59. DIMACS Series in Discrete Mathematics and Theoretical Computer Science. pp. 197–213. doi:10.1090/dimacs/059/10. ISBN   9780821828922. S2CID   2221596.
  3. Hromkovic, Juraj (2004). Algorithmics for Hard Problems. Springer International Publishing AG.
  4. Guzman, John Paul; Limoanco, Teresita (2017). "An Empirical Approach to Algorithm Analysis Resulting in Approximations to Big Theta Time Complexity" (PDF). Journal of Software. 12 (12).
  5. McGeoch, Catherine (2012). A Guide to Experimental Algorithmics. Cambridge University Press. ISBN   978-1-107-00173-2.
  6. Coppa, Emilio; Demetrescu, Camil; Finocchi, Irene (2014). "Input-Sensitive Profiling". IEEE Transactions on Software Engineering. 40 (12): 1185–1205. doi:10.1109/TSE.2014.2339825.
  7. Moret, Bernard M. E.; Bader, David A.; Warnow, Tandy (2002). "High-Performance Algorithm Engineering for Computational Phylogenetics" (PDF). The Journal of Supercomputing. 22 (1): 99–111. doi:10.1023/a:1014362705613. S2CID   614550.
  8. Zaparanuks, Dmitrijs; Hauswirth, Matthias (2012). Algorithmic Profiling. 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM Digital Library. pp. 67–76. CiteSeerX   10.1.1.459.4913 .
  9. "On experimental algorithmics: an interview with Catherine McGeoch and Bernard Moret". Ubiquity. ACM Digital Library. 2011 (August). 2011.
  10. Grzegorz, Mirek (2018). "Big-O Ambiguity". performant code_.
  11. Kölker, Jonas (2009). "When does Big-O notation fail?". Stack Overflow.
  12. Lemire, Daniel (2013). "Big-O notation and real-world performance". WordPress.
  13. "Finding Application Bottlenecks". dotTrace 2018.1 Help. JetBrains. 2018.
  14. Shmeltzer, Shay (2005). "Locating Bottlenecks in Your Code with the Event Profiler". Oracle Technology Network JDeveloper documentation. Oracle Corp.
  15. Shen, Du; Poshyvanyk, Denys; Luo, Qi; Grechanik, Mark (2015). "Automating performance bottleneck detection using search-based application profiling" (PDF). ISSTA 2015 Proceedings of the 2015 International Symposium on Software Testing and Analysis. ACM Digital Library: 270–281. doi:10.1145/2771783.2771816. ISBN   9781450336208. S2CID   8625903.
  16. "Performance & Memory Profiling and Code Coverage". The Profile Learning Center. SmartBear Software. 2018.
  17. Janssen, Thorben (2017). "11 Simple Java Performance Tuning Tips". Stackify Developer Tips, Tricks and Resources.
  18. O'Sullivan, Bryan; Stewart, Don; Goerzen, John (2008). "25. Profiling and optimization". Real World Haskell. O'Reilly Media.
  19. Linden, Doug (2007). "Profiling and Optimization". Second Life Wiki.
  20. Pattis, Richard E. (2007). "Analysis of Algorithms, Advanced Programming/Practicum, 15-200". School of Computer Science, Carnegie Mellon University.
  21. Wickham, Hadley (2014). "Optimising code". Advanced R. Chapman and Hall/CRC.
  22. Salz, Rich (1991). "wildmat.c". GitHub.
  23. Cantatore, Alessandro (2003). "Wildcard matching algorithms".
  24. Krauss, Kirk (2008). "Matching Wildcards: An Algorithm". Dr. Dobb's Journal .
  25. Krauss, Kirk (2014). "Matching Wildcards: An Empirical Way to Tame an Algorithm". Dr. Dobb's Journal .
  26. Krauss, Kirk (2018). "Matching Wildcards: An Improved Algorithm for Big Data". Develop for Performance.
  27. Grehan, Rick (2005). "Code Profilers: Choosing a Tool for Analyzing Performance" (PDF). Freescale Semiconductor. If, on the other hand, you need to step through your code with microscopic accuracy, fine-tuning individual machine instructions, then an active profiler with cycle-counting cannot be beat.
  28. Hough, Richard; et al. (2006). "Cycle-Accurate Microarchitecture Performance Evaluation". Proceedings of Workshop on Introspective Architecture. Georgia Institute of Technology. CiteSeerX   10.1.1.395.9306 .
  29. Khamparia, Aditya; Banu, Saira (2013). Program Analysis with Dynamic Instrumentation Pin and Performance Tools. IEEE International conference on Emerging trends in Computing, Communication and Nanotechnology. IEEE Xplore Digital Library.
  30. Jaskowski, Wojciech; Liskowski, Pawel; Szubert, Marcin Grzegorz; Krawiec, Krzysztof (2016). "The performance profile: A multi-criteria performance evaluation method for test-based problems" (PDF). Applied Mathematics and Computer Science. De Gruyter. 26: 216.