Incremental computing

Last updated

Incremental computing, also known as incremental computation, is a software feature which, whenever a piece of data changes, attempts to save time by only recomputing those outputs which depend on the changed data. [1] [2] [3] When incremental computing is successful, it can be significantly faster than computing new outputs naively. For example, a spreadsheet software package might use incremental computation in its recalculation feature, to update only those cells containing formulas which depend (directly or indirectly) on the changed cells.

Contents

When incremental computing is implemented by a tool that can implement it for a variety of different pieces of code automatically, that tool is an example of a program analysis tool for optimization.

Incremental computing derives a new input/output pair from one or more old input/output relationships. To do so, DP must relate a change in the input to a change in the output. The consumer of the result may be interested in the full updated output, or the delta output, or both. Incremental computing.svg
Incremental computing derives a new input/output pair from one or more old input/output relationships. To do so, ΔP must relate a change in the input to a change in the output. The consumer of the result may be interested in the full updated output, or the delta output, or both.

Static versus dynamic

Incremental computing techniques can be broadly separated into two types of approaches:

Static approaches attempt to derive an incremental program from a conventional program P using, e.g., either manual design and refactoring, or automatic program transformations. These program transformations occur before any inputs or input changes are provided.

Dynamic approaches record information about executing program P on a particular input (I1) and use this information when the input changes (to I2) in order to update the output (from O1 to O2). The figure shows the relationship between program P, the change calculation function ΔP, which constitutes the core of the incremental program, and a pair of inputs and outputs, I1, O1 and I2, O2.

Specialized versus general-purpose approaches

Some approaches to incremental computing are specialized, while others are general purpose. Specialized approaches require the programmer to explicitly specify the algorithms and data structures that will be used to preserve unchanged sub-calculations. General-purpose approaches, on the other hand, use language, compiler, or algorithmic techniques to give incremental behavior to otherwise non-incremental programs. [4]

Static methods

Program derivatives

Given a computation and a potential change , we can insert code before the change (the pre-derivative) and after the change (the post-derivative) to update the value of faster than rerunning . Paige has written down a list of rules for formal differentiation of programs in SUBSETL. [5]

View maintenance

In database systems such as DBToaster, views are defined with relational algebra. Incremental view maintenance statically analyzes relational algebra to create update rules that quickly maintain the view in the presence of small updates, such as insertion of a row. [6]

Dynamic methods

Incremental computation can be achieved by building a dependency graph of all the data elements that may need to be recalculated, and their dependencies. The elements that need to be updated when a single element changes are given by the transitive closure of the dependency relation of the graph. In other words, if there is a path from the changed element to another element, the latter may be updated (depending on whether the change eventually reaches the element). The dependency graph may need to be updated as dependencies change, or as elements are added to, or removed from, the system. It is used internally by the implementation, and does not typically need to be displayed to the user.

Capturing dependencies across all possible values can be avoided by identifying subset of important values (e.g., aggregation results) across which dependencies can be tracked, and incrementally recomputing other dependent variables, hence balancing the amount of dependency information to be tracked with the amount of recomputation to be performed upon input change. [7]

Partial evaluation can be seen as a method for automating the simplest possible case of incremental computing, in which an attempt is made to divide program data into two categories: that which can vary based on the program's input, and that which cannot (and the smallest unit of change is simply "all the data that can vary"). Partial evaluation can be combined with other incremental computing techniques.

With cycles in the dependency graph, a single pass through the graph may not be sufficient to reach a fixed point. In some cases, complete reevaluation of a system is semantically equivalent to incremental evaluation, and may be more efficient in practice if not in theory. [8]

Existing systems

Compiler and language support

Frameworks and libraries

Applications

See also

Related Research Articles

In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that map values to other values, rather than a sequence of imperative statements which update the running state of the program.

In programming language theory, lazy evaluation, or call-by-need, is an evaluation strategy which delays the evaluation of an expression until its value is needed and which also avoids repeated evaluations.

In computing, partial evaluation is a technique for several different types of program optimization by specialization. The most straightforward application is to produce new programs that run faster than the originals while being guaranteed to behave in the same way.

In compiler optimization, register allocation is the process of assigning local automatic variables and expression results to a limited number of processor registers.

Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program. A program's control-flow graph (CFG) is used to determine those parts of a program to which a particular value assigned to a variable might propagate. The information gathered is often used by compilers when optimizing a program. A canonical example of a data-flow analysis is reaching definitions.

In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming.

The Packrat parser is a type of parser that shares similarities with the recursive descent parser in its construction. However, it differs because it takes parsing expression grammars (PEGs) as input rather than LL grammars.

In computer programming, dataflow programming is a programming paradigm that models a program as a directed graph of the data flowing between operations, thus implementing dataflow principles and architecture. Dataflow programming languages share some features of functional languages, and were generally developed in order to bring some functional concepts to a language more suitable for numeric processing. Some authors use the term datastream instead of dataflow to avoid confusion with dataflow computing or dataflow architecture, based on an indeterministic machine paradigm. Dataflow programming was pioneered by Jack Dennis and his graduate students at MIT in the 1960s.

In mathematics, computer science and digital electronics, a dependency graph is a directed graph representing dependencies of several objects towards each other. It is possible to derive an evaluation order or the absence of an evaluation order that respects the given dependencies from the dependency graph.

Thread Level Speculation (TLS), also known as Speculative Multithreading, or Speculative Parallelization, is a technique to speculatively execute a section of computer code that is anticipated to be executed later in parallel with the normal execution on a separate independent thread. Such a speculative thread may need to make assumptions about the values of input variables. If these prove to be invalid, then the portions of the speculative thread that rely on these input variables will need to be discarded and squashed. If the assumptions are correct the program can complete in a shorter time provided the thread was able to be scheduled efficiently.

In computing, reactive programming is a declarative programming paradigm concerned with data streams and the propagation of change. With this paradigm, it's possible to express static or dynamic data streams with ease, and also communicate that an inferred dependency within the associated execution model exists, which facilitates the automatic propagation of the changed data flow.

A program structure tree (PST) is a hierarchical diagram that displays the nesting relationship of single-entry single-exit (SESE) fragments/regions, showing the organization of a computer program. Nodes in this tree represent SESE regions of the program, while edges represent nesting regions. The PST is defined for all control flow graphs.

In mathematical logic, monadic second-order logic (MSO) is the fragment of second-order logic where the second-order quantification is limited to quantification over sets. It is particularly important in the logic of graphs, because of Courcelle's theorem, which provides algorithms for evaluating monadic second-order formulas over graphs of bounded treewidth. It is also of fundamental importance in automata theory, where the Büchi–Elgot–Trakhtenbrot theorem gives a logical characterization of the regular languages.

SequenceL is a general purpose functional programming language and auto-parallelizing compiler and tool set, whose primary design objectives are performance on multi-core processor hardware, ease of programming, platform portability/optimization, and code clarity and readability. Its main advantage is that it can be used to write straightforward code that automatically takes full advantage of all the processing power available, without programmers needing to be concerned with identifying parallelisms, specifying vectorization, avoiding race conditions, and other challenges of manual directive-based programming approaches such as OpenMP.

In parallel computing, work stealing is a scheduling strategy for multithreaded computer programs. It solves the problem of executing a dynamically multithreaded computation, one that can "spawn" new threads of execution, on a statically multithreaded computer, with a fixed number of processors. It does so efficiently in terms of execution time, memory usage, and inter-processor communication.

(Ray) Tim Teitelbaum is an American computer scientist known for his early work on integrated development environments (IDEs), syntax-directed editing, and incremental computation. He is Professor Emeritus at Cornell University. As an educator and faculty member of the Cornell University Computer Science Department since 1973, he was recognized for his large-scale teaching of introductory programming, and for his mentoring of highly successful graduate students. As a businessman, he is known for having co-founded GrammaTech, Inc. and for having been its sole CEO from 1988 to 2019.

Susan Beth Horwitz was an American computer scientist noted for her research on programming languages and software engineering, and in particular on program slicing and dataflow-analysis. She had several best paper and an impact paper award mentioned below under awards.

In computer science, purely functional programming usually designates a programming paradigm—a style of building the structure and elements of computer programs—that treats all computation as the evaluation of mathematical functions.

Multitier programming is a programming paradigm for distributed software, which typically follows a multitier architecture, physically separating different functional aspects of the software into different tiers. Multitier programming allows functionalities that span multiple of such tiers to be developed in a single compilation unit using a single programming language. Without multitier programming, tiers are developed using different languages, e.g., JavaScript for the Web client, PHP for the Web server and SQL for the database. Multitier programming is often integrated into general-purpose languages by extending them with support for distribution.

References

  1. Carlsson, Magnus (2002). "Monads for incremental computing". Proceedings of the seventh ACM SIGPLAN international conference on Functional programming. New York: ACM. pp. 26–35. doi:10.1145/581478.581482. ISBN   1-58113-487-8.
  2. Umut A. Acar (2005). Self-Adjusting Computation (PDF) (Ph.D. thesis).
  3. Camil Demetrescu; Irene Finocchi; Andrea Ribichini (2011). "Reactive Imperative Programming with Dataflow Constraints". Proceedings of the 26th ACM International Conference on Object-Oriented Programming Systems Languages and Applications (OOPSLA 2011). ACM. pp. 407–426. arXiv: 1104.2293 . doi:10.1145/2048066.2048100. ISBN   978-1-4503-0940-0.
  4. Yan Chen; Joshua Dunfield; Matthew A. Hammer; Umut A. Acar. Implicit self-adjusting computation for purely functional programs. ICFP '11. pp. 129–141. Archived from the original on 2016-10-30. Retrieved 2018-03-12.
  5. Paige, Robert (1981). Formal Differentiation: A Program Synthesis Technique. UMI Research Press. ISBN   978-0-8357-1213-2.
  6. Ahmad, Yanif; Kennedy, Oliver; Koch, Christoph; Nikolic, Milos (2012-06-01). "DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views". Proc. VLDB Endow. 5 (10): 968–979. arXiv: 1207.0137 . doi:10.14778/2336664.2336670. ISSN   2150-8097.
  7. Mugilan Mariappan; Keval Vora (2019). "GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs". In European Conference on Computer Systems (EuroSys'19). pp. 25:1–25:16. doi:10.1145/3302424.3303974.
  8. Kimberley Burchett; Gregory H. Cooper; Shriram Krishnamurthi (2007). "Lowering: A static optimization technique for transparent functional reactivity". In ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation. pp. 71–80. CiteSeerX   10.1.1.90.5866 . ISBN   978-1-59593-620-2.
  9. Hammer, Matthew A.; Acar, Umut A.; Chen, Yan (2009). "CEAL". Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation - PLDI '09. p. 25. doi:10.1145/1542476.1542480. ISBN   9781605583921. S2CID   11058228.
  10. Reps, Thomas; Teitelbaum, Tim (1984). "The synthesizer generator". Proceedings of the first ACM SIGSOFT/SIGPLAN software engineering symposium on Practical software development environments - SDE 1. pp. 42–48. doi:10.1145/800020.808247. ISBN   978-0897911313.
  11. "Adapton: Programming Language Abstractions for Incremental Computation". adapton.org. Retrieved 2016-10-07.
  12. Saha, Diptikalyan; Ramakrishnan, C. R. (2005). "Incremental Evaluation of Tabled Prolog: Beyond Pure Logic Programs". Practical Aspects of Declarative Languages. Lecture Notes in Computer Science. Vol. 3819. pp. 215–229. CiteSeerX   10.1.1.111.7484 . doi:10.1007/11603023_15. ISBN   978-3-540-30947-5. ISSN   0302-9743.
  13. Hammer, Matthew; Phang, Khoo; Hicks, Michael; Foster, Jeffrey (2014). ADAPTON: Composable, Demand-Driven Incremental Computation (PDF). PLDI.