Partition refinement

Last updated

In the design of algorithms, partition refinement is a technique for representing a partition of a set as a data structure that allows the partition to be refined by splitting its sets into a larger number of smaller sets. In that sense it is dual to the union-find data structure, which also maintains a partition into disjoint sets but in which the operations merge pairs of sets. In some applications of partition refinement, such as lexicographic breadth-first search, the data structure maintains as well an ordering on the sets in the partition.

Contents

Partition refinement forms a key component of several efficient algorithms on graphs and finite automata, including DFA minimization, the Coffman–Graham algorithm for parallel scheduling, and lexicographic breadth-first search of graphs. [1] [2] [3]

Data structure

A partition refinement algorithm maintains a family of disjoint sets Si. At the start of the algorithm, this family contains a single set of all the elements in the data structure. At each step of the algorithm, a set X is presented to the algorithm, and each set Si in the family that contains members of X is split into two sets, the intersection SiX and the difference Si \ X.

Such an algorithm may be implemented efficiently by maintaining data structures representing the following information: [4] [5]

To perform a refinement operation, the algorithm loops through the elements of the given set X. For each such element x, it finds the set Si that contains x, and checks whether a second set for SiX has already been started. If not, it creates the second set and adds Si to a list L of the sets that are split by the operation. Then, regardless of whether a new set was formed, the algorithm removes x from Si and adds it to SiX. In the representation in which all elements are stored in a single array, moving x from one set to another may be performed by swapping x with the final element of Si and then decrementing the end index of Si and the start index of the new set. Finally, after all elements of X have been processed in this way, the algorithm loops through L, separating each current set Si from the second set that has been split from it, and reports both of these sets as being newly formed by the refinement operation.

The time to perform a single refinement operations in this way is O(|X|), independent of the number of elements in the family of sets and also independent of the total number of sets in the data structure. Thus, the time for a sequence of refinements is proportional to the total size of the sets given to the algorithm in each refinement step.

Applications

An early application of partition refinement was in an algorithm by Hopcroft (1971) for DFA minimization. In this problem, one is given as input a deterministic finite automaton, and must find an equivalent automaton with as few states as possible. Hopcroft's algorithm maintains a partition of the states of the input automaton into subsets, with the property that any two states in different subsets must be mapped to different states of the output automaton. Initially, there are two subsets, one containing all the accepting states of the automaton and one containing the remaining states. At each step one of the subsets Si and one of the input symbols x of the automaton are chosen, and the subsets of states are refined into states for which a transition labeled x would lead to Si, and states for which an x-transition would lead somewhere else. When a set Si that has already been chosen is split by a refinement, only one of the two resulting sets (the smaller of the two) needs to be chosen again; in this way, each state participates in the sets X for O(s log n) refinement steps and the overall algorithm takes time O(ns log n), where n is the number of initial states and s is the size of the alphabet. [6]

Partition refinement was applied by Sethi (1976) in an efficient implementation of the Coffman–Graham algorithm for parallel scheduling. Sethi showed that it could be used to construct a lexicographically ordered topological sort of a given directed acyclic graph in linear time; this lexicographic topological ordering is one of the key steps of the Coffman–Graham algorithm. In this application, the elements of the disjoint sets are vertices of the input graph and the sets X used to refine the partition are sets of neighbors of vertices. Since the total number of neighbors of all vertices is just the number of edges in the graph, the algorithm takes time linear in the number of edges, its input size. [7]

Partition refinement also forms a key step in lexicographic breadth-first search, a graph search algorithm with applications in the recognition of chordal graphs and several other important classes of graphs. Again, the disjoint set elements are vertices and the set X represent sets of neighbors, so the algorithm takes linear time. [8] [9]

See also

Related Research Articles

<span class="mw-page-title-main">Depth-first search</span> Search algorithm

Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node and explores as far as possible along each branch before backtracking. Extra memory, usually a stack, is needed to keep track of the nodes discovered so far along a specified branch which helps in backtracking of the graph.

<span class="mw-page-title-main">Component (graph theory)</span> Maximal subgraph whose vertices can reach each other

In graph theory, a component of an undirected graph is a connected subgraph that is not part of any larger connected subgraph. The components of any graph partition its vertices into disjoint sets, and are the induced subgraphs of those sets. A graph that is itself connected has exactly one component, consisting of the whole graph. Components are sometimes called connected components.

<span class="mw-page-title-main">Graph (abstract data type)</span> Abstract data type in computer science

In computer science, a graph is an abstract data type that is meant to implement the undirected graph and directed graph concepts from the field of graph theory within mathematics.

<span class="mw-page-title-main">Deterministic finite automaton</span> Finite-state machine

In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automaton (DFSA)—is a finite-state machine that accepts or rejects a given string of symbols, by running through a state sequence uniquely determined by the string. Deterministic refers to the uniqueness of the computation run. In search of the simplest models to capture finite-state machines, Warren McCulloch and Walter Pitts were among the first researchers to introduce a concept similar to finite automata in 1943.

<span class="mw-page-title-main">Chordal graph</span> Graph where all long cycles have a chord

In the mathematical area of graph theory, a chordal graph is one in which all cycles of four or more vertices have a chord, which is an edge that is not part of the cycle but connects two vertices of the cycle. Equivalently, every induced cycle in the graph should have exactly three vertices. The chordal graphs may also be characterized as the graphs that have perfect elimination orderings, as the graphs in which each minimal separator is a clique, and as the intersection graphs of subtrees of a tree. They are sometimes also called rigid circuit graphs or triangulated graphs.

<span class="mw-page-title-main">Weak ordering</span> Mathematical ranking of a set

In mathematics, especially order theory, a weak ordering is a mathematical formalization of the intuitive notion of a ranking of a set, some of whose members may be tied with each other. Weak orders are a generalization of totally ordered sets and are in turn generalized by (strictly) partially ordered sets and preorders.

In computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. Precisely, a topological sort is a graph traversal in which each node v is visited only after all its dependencies are visited. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time. Topological sorting has many applications especially in ranking problems such as feedback arc set. Topological sorting is possible even when the DAG has disconnected components.

<span class="mw-page-title-main">Level structure</span>

In the mathematical subfield of graph theory a level structure of an undirected graph is a partition of the vertices into subsets that have the same distance from a given root vertex.

In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets, and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets.

In the theory of computation and automata theory, the powerset construction or subset construction is a standard method for converting a nondeterministic finite automaton (NFA) into a deterministic finite automaton (DFA) which recognizes the same formal language. It is important in theory because it establishes that NFAs, despite their additional flexibility, are unable to recognize any language that cannot be recognized by some DFA. It is also important in practice for converting easier-to-construct NFAs into more efficiently executable DFAs. However, if the NFA has n states, the resulting DFA may have up to 2n states, an exponentially larger number, which sometimes makes the construction impractical for large NFAs.

<span class="mw-page-title-main">Dominating set</span> Subset of a graphs nodes such that all other nodes link to at least one

In graph theory, a dominating set for a graph G is a subset D of its vertices, such that any vertex of G is either in D, or has a neighbor in D. The domination numberγ(G) is the number of vertices in a smallest dominating set for G.

In computer science, the Hopcroft–Karp algorithm is an algorithm that takes a bipartite graph as input and produces a maximum-cardinality matching as output — a set of as many edges as possible with the property that no two edges share an endpoint. It runs in time in the worst case, where is set of edges in the graph, is set of vertices of the graph, and it is assumed that . In the case of dense graphs the time bound becomes , and for sparse random graphs it runs in time with high probability.

<span class="mw-page-title-main">Pseudoforest</span> Graph with one cycle per component

In graph theory, a pseudoforest is an undirected graph in which every connected component has at most one cycle. That is, it is a system of vertices and edges connecting pairs of vertices, such that no two cycles of consecutive edges share any vertex with each other, nor can any two cycles be connected to each other by a path of consecutive edges. A pseudotree is a connected pseudoforest.

<span class="mw-page-title-main">DFA minimization</span>

In automata theory, DFA minimization is the task of transforming a given deterministic finite automaton (DFA) into an equivalent DFA that has a minimum number of states. Here, two DFAs are called equivalent if they recognize the same regular language. Several different algorithms accomplishing this task are known and described in standard textbooks on automata theory.

In graph theory, the planar separator theorem is a form of isoperimetric inequality for planar graphs, that states that any planar graph can be split into smaller pieces by removing a small number of vertices. Specifically, the removal of vertices from an n-vertex graph can partition the graph into disjoint subgraphs each of which has at most vertices.

In computer science, lexicographic breadth-first search or Lex-BFS is a linear time algorithm for ordering the vertices of a graph. The algorithm is different from a breadth-first search, but it produces an ordering that is consistent with breadth-first search.

The Kernighan–Lin algorithm is a heuristic algorithm for finding partitions of graphs. The algorithm has important practical application in the layout of digital circuits and components in electronic design automation of VLSI.

In graph theory, the modular decomposition is a decomposition of a graph into subsets of vertices called modules. A module is a generalization of a connected component of a graph. Unlike connected components, however, one module can be a proper subset of another. Modules therefore lead to a recursive (hierarchical) decomposition of the graph, instead of just a partition.

The Coffman–Graham algorithm is an algorithm for arranging the elements of a partially ordered set into a sequence of levels. The algorithm chooses an arrangement such that an element that comes after another in the order is assigned to a lower level, and such that each level has a number of elements that does not exceed a fixed width bound W. When W = 2, it uses the minimum possible number of distinct levels, and in general it uses at most 2 − 2/W times as many levels as necessary.

The HCS clustering algorithm is an algorithm based on graph connectivity for cluster analysis. It works by representing the similarity data in a similarity graph, and then finding all the highly connected subgraphs. It does not make any prior assumptions on the number of the clusters. This algorithm was published by Erez Hartuv and Ron Shamir in 2000.

References

  1. Paige, Robert; Tarjan, Robert E. (1987), "Three partition refinement algorithms", SIAM Journal on Computing , 16 (6): 973–989, doi:10.1137/0216062, MR   0917035 .
  2. Habib, Michel; Paul, Christophe; Viennot, Laurent (1999), "Partition refinement techniques: an interesting algorithmic tool kit", International Journal of Foundations of Computer Science , 10 (2): 147–170, doi:10.1142/S0129054199000125, MR   1759929 .
  3. Habib, Michel; Paul, Christophe; Viennot, Laurent (1998), "A synthesis on partition refinement: a useful routine for strings, graphs, Boolean matrices and automata", in Morvan, Michel; Meinel, Christoph; Krob, Daniel (eds.), STACS 98: 15th Annual Symposium on Theoretical Aspects of Computer Science Paris, France, February 25–27, 1998, Proceedings (PDF), Lecture Notes in Computer Science, vol. 1373, Springer-Verlag, pp. 25–38, doi:10.1007/BFb0028546, ISBN   978-3-540-64230-5, MR   1650757 .
  4. Valmari, Antti; Lehtinen, Petri (2008), "Efficient minimization of DFAs with partial transition functions", in Albers, Susanne; Weil, Pascal (eds.), 25th International Symposium on Theoretical Aspects of Computer Science (STACS 2008), Leibniz International Proceedings in Informatics (LIPIcs), vol. 1, Dagstuhl, Germany: Schloss Dagstuhl: Leibniz-Zentrum fuer Informatik, pp. 645–656, arXiv: 0802.2826 , doi:10.4230/LIPIcs.STACS.2008.1328, ISBN   978-3-939897-06-4, MR   2873773
  5. Knuutila, Timo (2001), "Re-describing an algorithm by Hopcroft", Theoretical Computer Science , 250 (1–2): 333–363, doi:10.1016/S0304-3975(99)00150-4, ISSN   0304-3975
  6. Hopcroft, John (1971), "An n log n algorithm for minimizing states in a finite automaton", Theory of machines and computations (Proc. Internat. Sympos., Technion, Haifa, 1971), New York: Academic Press, pp. 189–196, MR   0403320 .
  7. Sethi, Ravi (1976), "Scheduling graphs on two processors", SIAM Journal on Computing , 5 (1): 73–82, doi:10.1137/0205005, MR   0398156 .
  8. Rose, D. J.; Tarjan, R. E.; Lueker, G. S. (1976), "Algorithmic aspects of vertex elimination on graphs", SIAM Journal on Computing , 5 (2): 266–283, doi:10.1137/0205021 .
  9. Corneil, Derek G. (2004), "Lexicographic breadth first search – a survey", in Hromkovič, Juraj; Nagl, Manfred; Westfechtel, Bernhard (eds.), Graph-Theoretic Methods in Computer Science: 30th International Workshop, WG 2004, Bad Honnef, Germany, June 21-23, 2004, Revised Papers, Lecture Notes in Computer Science, vol. 3353, Springer-Verlag, pp. 1–19, doi:10.1007/978-3-540-30559-0_1 .