Partition refinement

Last updated July 30, 2024

In the design of algorithms, partition refinement is a technique for representing a partition of a set as a data structure that allows the partition to be refined by splitting its sets into a larger number of smaller sets. In that sense it is dual to the union-find data structure, which also maintains a partition into disjoint sets but in which the operations merge pairs of sets. In some applications of partition refinement, such as lexicographic breadth-first search, the data structure maintains as well an ordering on the sets in the partition.

Data structure

A partition refinement algorithm maintains a family of disjoint sets $S i$ . At the start of the algorithm, this family contains a single set of all the elements in the data structure. At each step of the algorithm, a set $X$ is presented to the algorithm, and each set $S i$ in the family that contains members of $X$ is split into two sets, the intersection $S i \cap X$ and the difference $Si \ X$ .

Such an algorithm may be implemented efficiently by maintaining data structures representing the following information:^[4]^[5]

The ordered sequence of the sets $S i$ in the family, in a form such as a doubly linked list that allows new sets to be inserted into the middle of the sequence
Associated with each set $S i$ , a collection of its elements of $S i$ , in a form such as a doubly linked list or array data structure that allows for rapid deletion of individual elements from the collection. Alternatively, this component of the data structure may be represented by storing all of the elements of all of the sets in a single array, sorted by the identity of the set they belong to, and by representing the collection of elements in any set $S i$ by its starting and ending positions in this array.
Associated with each element, the set it belongs to.

To perform a refinement operation, the algorithm loops through the elements of the given set $X$ . For each such element $x$ , it finds the set $S i$ that contains $x$ , and checks whether a second set for $S i \cap X$ has already been started. If not, it creates the second set and adds $S i$ to a list $L$ of the sets that are split by the operation. Then, regardless of whether a new set was formed, the algorithm removes $x$ from $S i$ and adds it to $S i \cap X$ . In the representation in which all elements are stored in a single array, moving $x$ from one set to another may be performed by swapping $x$ with the final element of $S i$ and then decrementing the end index of $S i$ and the start index of the new set. Finally, after all elements of $X$ have been processed in this way, the algorithm loops through $L$ , separating each current set $S i$ from the second set that has been split from it, and reports both of these sets as being newly formed by the refinement operation.

The time to perform a single refinement operation in this way is $O (| X |)$ , independent of the number of elements in the family of sets and also independent of the total number of sets in the data structure. Thus, the time for a sequence of refinements is proportional to the total size of the sets given to the algorithm in each refinement step.

Applications

An early application of partition refinement was in an algorithm by Hopcroft (1971) for DFA minimization. In this problem, one is given as input a deterministic finite automaton, and must find an equivalent automaton with as few states as possible. Hopcroft's algorithm maintains a partition of the states of the input automaton into subsets, with the property that any two states in different subsets must be mapped to different states of the output automaton. Initially, there are two subsets, one containing all the accepting states of the automaton and one containing the remaining states. At each step one of the subsets $S i$ and one of the input symbols $x$ of the automaton are chosen, and the subsets of states are refined into states for which a transition labeled $x$ would lead to $S i$ , and states for which an $x$ -transition would lead somewhere else. When a set $S i$ that has already been chosen is split by a refinement, only one of the two resulting sets (the smaller of the two) needs to be chosen again; in this way, each state participates in the sets $X$ for $O (s log n)$ refinement steps and the overall algorithm takes time $O (ns log n)$ , where $n$ is the number of initial states and $s$ is the size of the alphabet.^[6]

Partition refinement was applied by Sethi (1976) in an efficient implementation of the Coffman–Graham algorithm for parallel scheduling. Sethi showed that it could be used to construct a lexicographically ordered topological sort of a given directed acyclic graph in linear time; this lexicographic topological ordering is one of the key steps of the Coffman–Graham algorithm. In this application, the elements of the disjoint sets are vertices of the input graph and the sets $X$ used to refine the partition are sets of neighbors of vertices. Since the total number of neighbors of all vertices is just the number of edges in the graph, the algorithm takes time linear in the number of edges, its input size.^[7]

Partition refinement also forms a key step in lexicographic breadth-first search, a graph search algorithm with applications in the recognition of chordal graphs and several other important classes of graphs. Again, the disjoint set elements are vertices and the set $X$ represents sets of neighbors, so the algorithm takes linear time.^[8]^[9]

Related Research Articles

<span class="mw-page-title-main">Component (graph theory)</span> Maximal subgraph whose vertices can reach each other

In graph theory, a component of an undirected graph is a connected subgraph that is not part of any larger connected subgraph. The components of any graph partition its vertices into disjoint sets, and are the induced subgraphs of those sets. A graph that is itself connected has exactly one component, consisting of the whole graph. Components are sometimes called connected components.

<span class="mw-page-title-main">Graph (abstract data type)</span> Abstract data type in computer science

In computer science, a graph is an abstract data type that is meant to implement the undirected graph and directed graph concepts from the field of graph theory within mathematics.

In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automaton (DFSA)—is a finite-state machine that accepts or rejects a given string of symbols, by running through a state sequence uniquely determined by the string. Deterministic refers to the uniqueness of the computation run. In search of the simplest models to capture finite-state machines, Warren McCulloch and Walter Pitts were among the first researchers to introduce a concept similar to finite automata in 1943.

<span class="mw-page-title-main">Chordal graph</span> Graph where all long cycles have a chord

In the mathematical area of graph theory, a chordal graph is one in which all cycles of four or more vertices have a chord, which is an edge that is not part of the cycle but connects two vertices of the cycle. Equivalently, every induced cycle in the graph should have exactly three vertices. The chordal graphs may also be characterized as the graphs that have perfect elimination orderings, as the graphs in which each minimal separator is a clique, and as the intersection graphs of subtrees of a tree. They are sometimes also called rigid circuit graphs or triangulated graphs: a chordal completion of a graph is typically called a triangulation of that graph.

In the mathematical theory of matroids, a graphic matroid is a matroid whose independent sets are the forests in a given finite undirected graph. The dual matroids of graphic matroids are called co-graphic matroids or bond matroids. A matroid that is both graphic and co-graphic is sometimes called a planar matroid ; these are exactly the graphic matroids formed from planar graphs.

<span class="mw-page-title-main">Weak ordering</span> Mathematical ranking of a set

In mathematics, especially order theory, a weak ordering is a mathematical formalization of the intuitive notion of a ranking of a set, some of whose members may be tied with each other. Weak orders are a generalization of totally ordered sets and are in turn generalized by (strictly) partially ordered sets and preorders.

In the mathematical subfield of graph theory a level structure of an undirected graph is a partition of the vertices into subsets that have the same distance from a given root vertex.

In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets, and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets.

In the theory of computation and automata theory, the powerset construction or subset construction is a standard method for converting a nondeterministic finite automaton (NFA) into a deterministic finite automaton (DFA) which recognizes the same formal language. It is important in theory because it establishes that NFAs, despite their additional flexibility, are unable to recognize any language that cannot be recognized by some DFA. It is also important in practice for converting easier-to-construct NFAs into more efficiently executable DFAs. However, if the NFA has n states, the resulting DFA may have up to 2ⁿ states, an exponentially larger number, which sometimes makes the construction impractical for large NFAs.

<span class="mw-page-title-main">Dominating set</span> Subset of a graphs nodes such that all other nodes link to at least one

In graph theory, a dominating set for a graph $G$ is a subset $D$ of its vertices, such that any vertex of $G$ is in $D$ , or has a neighbor in $D$ . The domination number $γ(G)$ is the number of vertices in a smallest dominating set for $G$ .

Set packing is a classical NP-complete problem in computational complexity theory and combinatorics, and was one of Karp's 21 NP-complete problems. Suppose one has a finite set S and a list of subsets of S. Then, the set packing problem asks if some k subsets in the list are pairwise disjoint.

In computer science, the Hopcroft–Karp algorithm is an algorithm that takes a bipartite graph as input and produces a maximum-cardinality matching as output — a set of as many edges as possible with the property that no two edges share an endpoint. It runs in $time in the worst case, where is set of edges in the graph, is set of vertices of the graph, and it is assumed that . In the case of dense graphs the time bound becomes, and for sparse random graphs it runs in time with high probability.$

<span class="mw-page-title-main">Pseudoforest</span> Graph with at most one cycle per component

In graph theory, a pseudoforest is an undirected graph in which every connected component has at most one cycle. That is, it is a system of vertices and edges connecting pairs of vertices, such that no two cycles of consecutive edges share any vertex with each other, nor can any two cycles be connected to each other by a path of consecutive edges. A pseudotree is a connected pseudoforest.

In automata theory, DFA minimization is the task of transforming a given deterministic finite automaton (DFA) into an equivalent DFA that has a minimum number of states. Here, two DFAs are called equivalent if they recognize the same regular language. Several different algorithms accomplishing this task are known and described in standard textbooks on automata theory.

In graph theory, the planar separator theorem is a form of isoperimetric inequality for planar graphs, that states that any planar graph can be split into smaller pieces by removing a small number of vertices. Specifically, the removal of ⁠ $⁠$ vertices from an $n$ -vertex graph can partition the graph into disjoint subgraphs each of which has at most ⁠ $⁠$ vertices.

In computer science, lexicographic breadth-first search or Lex-BFS is a linear time algorithm for ordering the vertices of a graph. The algorithm is different from a breadth-first search, but it produces an ordering that is consistent with breadth-first search.

In graph theory, the modular decomposition is a decomposition of a graph into subsets of vertices called modules. A module is a generalization of a connected component of a graph. Unlike connected components, however, one module can be a proper subset of another. Modules therefore lead to a recursive (hierarchical) decomposition of the graph, instead of just a partition.

The Coffman–Graham algorithm is an algorithm for arranging the elements of a partially ordered set into a sequence of levels. The algorithm chooses an arrangement such that an element that comes after another in the order is assigned to a lower level, and such that each level has a number of elements that does not exceed a fixed width bound $W$ . When $W = 2$ , it uses the minimum possible number of distinct levels, and in general it uses at most $2 - 2/ W$ times as many levels as necessary.

In the study of graph algorithms, Courcelle's theorem is the statement that every graph property definable in the monadic second-order logic of graphs can be decided in linear time on graphs of bounded treewidth. The result was first proved by Bruno Courcelle in 1990 and independently rediscovered by Borie, Parker & Tovey (1992). It is considered the archetype of algorithmic meta-theorems.

The HCS clustering algorithm is an algorithm based on graph connectivity for cluster analysis. It works by representing the similarity data in a similarity graph, and then finding all the highly connected subgraphs. It does not make any prior assumptions on the number of the clusters. This algorithm was published by Erez Hartuv and Ron Shamir in 2000.

References

↑ Paige, Robert; Tarjan, Robert E. (1987), "Three partition refinement algorithms", SIAM Journal on Computing , 16 (6): 973–989, doi:10.1137/0216062, MR 0917035 .
↑ Habib, Michel; Paul, Christophe; Viennot, Laurent (1999), "Partition refinement techniques: an interesting algorithmic tool kit", International Journal of Foundations of Computer Science , 10 (2): 147–170, doi:10.1142/S0129054199000125, MR 1759929 .
↑ Habib, Michel; Paul, Christophe; Viennot, Laurent (1998), "A synthesis on partition refinement: a useful routine for strings, graphs, Boolean matrices and automata", in Morvan, Michel; Meinel, Christoph; Krob, Daniel (eds.), STACS 98: 15th Annual Symposium on Theoretical Aspects of Computer Science Paris, France, February 25–27, 1998, Proceedings (PDF), Lecture Notes in Computer Science, vol. 1373, Springer-Verlag, pp. 25–38, doi:10.1007/BFb0028546, ISBN 978-3-540-64230-5, MR 1650757 .
↑ Valmari, Antti; Lehtinen, Petri (2008), "Efficient minimization of DFAs with partial transition functions", in Albers, Susanne; Weil, Pascal (eds.), 25th International Symposium on Theoretical Aspects of Computer Science (STACS 2008), Leibniz International Proceedings in Informatics (LIPIcs), vol. 1, Dagstuhl, Germany: Schloss Dagstuhl: Leibniz-Zentrum fuer Informatik, pp. 645–656, arXiv: 0802.2826 , doi: 10.4230/LIPIcs.STACS.2008.1328 , ISBN 978-3-939897-06-4, MR 2873773
↑ Knuutila, Timo (2001), "Re-describing an algorithm by Hopcroft", Theoretical Computer Science , 250 (1–2): 333–363, doi:10.1016/S0304-3975(99)00150-4, ISSN 0304-3975
↑ Hopcroft, John (1971), "An $n log n$ algorithm for minimizing states in a finite automaton", Theory of machines and computations (Proc. Internat. Sympos., Technion, Haifa, 1971), New York: Academic Press, pp. 189–196, MR 0403320 .
↑ Sethi, Ravi (1976), "Scheduling graphs on two processors", SIAM Journal on Computing , 5 (1): 73–82, doi:10.1137/0205005, MR 0398156 .
↑ Rose, D. J.; Tarjan, R. E.; Lueker, G. S. (1976), "Algorithmic aspects of vertex elimination on graphs", SIAM Journal on Computing , 5 (2): 266–283, doi:10.1137/0205021 .
↑ Corneil, Derek G. (2004), "Lexicographic breadth first search – a survey", in Hromkovič, Juraj; Nagl, Manfred; Westfechtel, Bernhard (eds.), Graph-Theoretic Methods in Computer Science: 30th International Workshop, WG 2004, Bad Honnef, Germany, June 21-23, 2004, Revised Papers, Lecture Notes in Computer Science, vol. 3353, Springer-Verlag, pp. 1–19, doi:10.1007/978-3-540-30559-0_1, ISBN 978-3-540-24132-4 .

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Paige, Robert; Tarjan, Robert E. (1987), "Three partition refinement algorithms", SIAM Journal on Computing , 16 (6): 973–989, doi:10.1137/0216062, MR 0917035 .

[2] Habib, Michel; Paul, Christophe; Viennot, Laurent (1999), "Partition refinement techniques: an interesting algorithmic tool kit", International Journal of Foundations of Computer Science , 10 (2): 147–170, doi:10.1142/S0129054199000125, MR 1759929 .

[3] Habib, Michel; Paul, Christophe; Viennot, Laurent (1998), "A synthesis on partition refinement: a useful routine for strings, graphs, Boolean matrices and automata", in Morvan, Michel; Meinel, Christoph; Krob, Daniel (eds.), STACS 98: 15th Annual Symposium on Theoretical Aspects of Computer Science Paris, France, February 25–27, 1998, Proceedings (PDF), Lecture Notes in Computer Science, vol. 1373, Springer-Verlag, pp. 25–38, doi:10.1007/BFb0028546, ISBN 978-3-540-64230-5, MR 1650757 .

[4] Valmari, Antti; Lehtinen, Petri (2008), "Efficient minimization of DFAs with partial transition functions", in Albers, Susanne; Weil, Pascal (eds.), 25th International Symposium on Theoretical Aspects of Computer Science (STACS 2008), Leibniz International Proceedings in Informatics (LIPIcs), vol. 1, Dagstuhl, Germany: Schloss Dagstuhl: Leibniz-Zentrum fuer Informatik, pp. 645–656, arXiv: 0802.2826 , doi: 10.4230/LIPIcs.STACS.2008.1328 , ISBN 978-3-939897-06-4, MR 2873773

[5] Knuutila, Timo (2001), "Re-describing an algorithm by Hopcroft", Theoretical Computer Science , 250 (1–2): 333–363, doi:10.1016/S0304-3975(99)00150-4, ISSN 0304-3975

[6] Hopcroft, John (1971), "An $n log n$ algorithm for minimizing states in a finite automaton", Theory of machines and computations (Proc. Internat. Sympos., Technion, Haifa, 1971), New York: Academic Press, pp. 189–196, MR 0403320 .

[7] Sethi, Ravi (1976), "Scheduling graphs on two processors", SIAM Journal on Computing , 5 (1): 73–82, doi:10.1137/0205005, MR 0398156 .

[8] Rose, D. J.; Tarjan, R. E.; Lueker, G. S. (1976), "Algorithmic aspects of vertex elimination on graphs", SIAM Journal on Computing , 5 (2): 266–283, doi:10.1137/0205021 .

[9] Corneil, Derek G. (2004), "Lexicographic breadth first search – a survey", in Hromkovič, Juraj; Nagl, Manfred; Westfechtel, Bernhard (eds.), Graph-Theoretic Methods in Computer Science: 30th International Workshop, WG 2004, Bad Honnef, Germany, June 21-23, 2004, Revised Papers, Lecture Notes in Computer Science, vol. 3353, Springer-Verlag, pp. 1–19, doi:10.1007/978-3-540-30559-0_1, ISBN 978-3-540-24132-4 .

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

Partition refinement

Contents

Data structure

Applications

Related Research Articles

References