E-graph

Last updated

In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.

Contents

Definition and operations

Let be a set of uninterpreted functions, where is the subset of consisting of functions of arity . Let be a countable set of opaque identifiers that may be compared for equality, called e-class IDs. The application of to e-class IDs is denoted and called an e-node.

The e-graph then represents equivalence classes of e-nodes, using the following data structures: [1]

Invariants

In addition to the above structure, a valid e-graph conforms to several data structure invariants. [2] Two e-nodes are equivalent if they are in the same e-class. The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes are congruent when . The hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.

Operations

E-graphs expose wrappers around the , , and operations from the union-find that preserve the e-graph invariants. The last operation, e-matching, is described below.

Equivalent formulations

An e-graph can also be formulated as a bipartite graph where

There is a directed edge from each e-class to each of its members, and from each e-node to each of its children. [3]

E-matching

Let be a set of variables and let be the smallest set that includes the 0-arity function symbols (also called constants), includes the variables, and is closed under application of the function symbols. In other words, is the smallest set such that , , and when and , then . A term containing variables is called a pattern, a term without variables is called ground.

An e-graph represents a ground term if one of its e-classes represents . An e-class represents if some e-node does. An e-node represents a term if and each e-class represents the term ( in ).

e-matching is an operation that takes a pattern and an e-graph , and yields all pairs where is a substitution mapping the variables in to e-class IDs and is an e-class ID such that the term is represented by . There are several known algorithms for e-matching, [4] [5] the relational e-matching algorithm is based on worst-case optimal joins and is worst-case optimal. [6]

Extraction

Given an e-class and a cost function that maps each function symbol in to a natural number, the extraction problem is to find a ground term with minimal total cost that is represented by the given e-class. This problem is NP-hard. [7] There is also no constant-factor approximation algorithm for this problem, which can be shown by reduction from the set cover problem. However, for graphs with bounded treewidth, there is a linear-time, fixed-parameter tractable algorithm. [8]

Complexity

Equality saturation

Equality saturation is a technique for building optimizing compilers using e-graphs. [10] It operates by applying a set of rewrites using e-matching until the e-graph is saturated, a timeout is reached, an e-graph size limit is reached, a fixed number of iterations is exceeded, or some other halting condition is reached. After rewriting, an optimal term is extracted from the e-graph according to some cost function, usually related to AST size or performance considerations.

Applications

E-graphs are used in automated theorem proving. They are a crucial part of modern SMT solvers such as Z3 [11] and CVC4, where they are used to decide the empty theory by computing the congruence closure of a set of equalities, and e-matching is used to instantiate quantifiers. [12] In DPLL(T)-based solvers that use conflict-driven clause learning (also known as non-chronological backtracking), e-graphs are extended to produce proof certificates. [13] E-graphs are also used in the Simplify theorem prover of ESC/Java. [14]

Equality saturation is used in specialized optimizing compilers, [15] e.g. for deep learning [16] and linear algebra. [17] Equality saturation has also been used for translation validation applied to the LLVM toolchain. [18]

E-graphs have been applied to several problems in program analysis, including fuzzing, [19] abstract interpretation, [20] and library learning. [21]

References

  1. ( Willsey et al. 2021 )
  2. ( Willsey et al. 2021 )
  3. ( Goharshady, Lam & Parreaux 2024 )
  4. ( de Moura & Bjørner 2007 )
  5. Moskal, Michał; Łopuszański, Jakub; Kiniry, Joseph R. (2008-05-06). "E-matching for Fun and Profit". Electronic Notes in Theoretical Computer Science. Proceedings of the 5th International Workshop on Satisfiability Modulo Theories (SMT 2007). 198 (2): 19–35. doi: 10.1016/j.entcs.2008.04.078 . ISSN   1571-0661.
  6. Zhang, Yihong; Wang, Yisu Remy; Willsey, Max; Tatlock, Zachary (2022-01-12). "Relational e-matching". Proceedings of the ACM on Programming Languages. 6 (POPL): 35:1–35:22. doi: 10.1145/3498696 . S2CID   236924583.
  7. Stepp, Michael Benjamin (2011). Equality saturation: engineering challenges and applications (PhD thesis). USA: University of California at San Diego. ISBN   978-1-267-03827-2.
  8. ( Goharshady, Lam & Parreaux 2024 )
  9. ( Flatt et al. 2022 , p. 2)
  10. ( Tate et al. 2009 )
  11. de Moura, Leonardo; Bjørner, Nikolaj (2008). "Z3: An Efficient SMT Solver". In Ramakrishnan, C. R.; Rehof, Jakob (eds.). Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science. Vol. 4963. Berlin, Heidelberg: Springer. pp. 337–340. doi: 10.1007/978-3-540-78800-3_24 . ISBN   978-3-540-78800-3.
  12. Rümmer, Philipp (2012). "E-Matching with Free Variables". In Bjørner, Nikolaj; Voronkov, Andrei (eds.). Logic for Programming, Artificial Intelligence, and Reasoning. Proceedings. 18th International Conference, LPAR-18, Merida, Venezuela, March 11–15, 2012. Lecture Notes in Computer Science. Vol. 7180. Berlin, Heidelberg: Springer. pp. 359–374. doi:10.1007/978-3-642-28717-6_28. ISBN   978-3-642-28717-6.
  13. ( Flatt et al. 2022 , p. 2)
  14. Detlefs, David; Nelson, Greg; Saxe, James B. (May 2005). "Simplify: a theorem prover for program checking". Journal of the ACM. 52 (3): 365–473. doi:10.1145/1066100.1066102. ISSN   0004-5411. S2CID   9613854.
  15. Joshi, Rajeev; Nelson, Greg; Randall, Keith (2002-05-17). "Denali: a goal-directed superoptimizer". ACM SIGPLAN Notices. 37 (5): 304–314. doi:10.1145/543552.512566. ISSN   0362-1340.
  16. Yang, Yichen; Phothilimtha, Phitchaya Mangpo; Wang, Yisu Remy; Willsey, Max; Roy, Sudip; Pienaar, Jacques (2021-03-17). "Equality Saturation for Tensor Graph Superoptimization". arXiv: 2101.01332 [cs.AI].
  17. Wang, Yisu Remy; Hutchison, Shana; Leang, Jonathan; Howe, Bill; Suciu, Dan (2020-12-22). "SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra". arXiv: 2002.07951 [cs.DB].
  18. Stepp, Michael; Tate, Ross; Lerner, Sorin (2011). "Equality-Based Translation Validator for LLVM". In Gopalakrishnan, Ganesh; Qadeer, Shaz (eds.). Computer Aided Verification. Lecture Notes in Computer Science. Vol. 6806. Berlin, Heidelberg: Springer. pp. 737–742. doi: 10.1007/978-3-642-22110-1_59 . ISBN   978-3-642-22110-1.
  19. "Wasm-mutate: Fuzzing WebAssembly Compilers with E-Graphs (EGRAPHS 2022) - PLDI 2022". pldi22.sigplan.org. Retrieved 2023-02-03.
  20. Coward, Samuel; Constantinides, George A.; Drane, Theo (2022-03-17). "Abstract Interpretation on E-Graphs". arXiv: 2203.09191 [cs.LO].
    Coward, Samuel; Constantinides, George A.; Drane, Theo (2022-05-30). "Combining E-Graphs with Abstract Interpretation". arXiv: 2205.14989 [cs.DS].
  21. Cao, David; Kunkel, Rose; Nandi, Chandrakana; Willsey, Max; Tatlock, Zachary; Polikarpova, Nadia (2023-01-09). "babble: Learning Better Abstractions with E-Graphs and Anti-Unification". Proceedings of the ACM on Programming Languages. 7 (POPL): 396–424. arXiv: 2212.04596 . doi:10.1145/3571207. ISSN   2475-1421. S2CID   254536022.