Color-coding

Last updated November 18, 2024

In computer science and graph theory, the term color-coding refers to an algorithmic technique which is useful in the discovery of network motifs. For example, it can be used to detect a simple path of length $k$ in a given graph. The traditional color-coding algorithm is probabilistic, but it can be derandomized without much overhead in the running time.

Results

The following results can be obtained through the method of color-coding:

For every fixed constant k, if a graph G = (V, E) contains a simple cycle of size k, then such a cycle can be found in:
- $O(|V|^{\omega })$ expected time, or
- $O(|V|^{\omega }\log |V|)$ worst-case time, where $ω$ is the exponent of matrix multiplication.^[3]
For every fixed constant k, and every graph G = (V, E) that is in any nontrivial minor-closed graph family (e.g., a planar graph), if G contains a simple cycle of size k, then such cycle can be found in:
- $O (V)$ expected time, or
- $O (V log V)$ worst-case time.
If a graph $G = (V, E)$ contains a subgraph isomorphic to a bounded treewidth graph which has $O (log V)$ vertices, then such a subgraph can be found in polynomial time.

The method

To solve the problem of finding a subgraph $H=(V_{H},E_{H})$ in a given graph $G = (V, E)$ , where $H$ can be a path, a cycle, or any bounded treewidth graph where $|V_{H}|=O(\log |V|)$ , the method of color-coding begins by randomly coloring each vertex of $G$ with $k=|V_{H}|$ colors, and then tries to find a colorful copy of $H$ in colored $G$ . Here, a graph is colorful if every vertex in it is colored with a distinct color. This method works by repeating (1) random coloring a graph and (2) finding colorful copy of the target subgraph, and eventually the target subgraph can be found if the process is repeated a sufficient number of times.

Suppose a copy of $H$ in $G$ becomes colorful with some non-zero probability $p$ . It immediately follows that if the random coloring is repeated $.mw-parser-output .sfrac{white-space:nowrap}.mw-parser-output .sfrac.tion,.mw-parser-output .sfrac .tion{display:inline-block;vertical-align:-0.5em;font-size:85%;text-align:center}.mw-parser-output .sfrac .num{display:block;line-height:1em;margin:0.0em 0.1em;border-bottom:1px solid}.mw-parser-output .sfrac .den{display:block;line-height:1em;margin:0.1em 0.1em}.mw-parser-output .sr-only{border:0;clip:rect(0,0,0,0);clip-path:polygon(0px 0px,0px 0px,0px 0px);height:1px;margin:-1px;overflow:hidden;padding:0;position:absolute;width:1px}⁠1/p⁠$ times, then this copy is expected to become colorful once. Note that though $p$ is small, it is shown that if $|V_{H}|=O(\log |V|)$ , $p$ is only polynomially small. Suppose again there exists an algorithm such that, given a graph $G$ and a coloring which maps each vertex of $G$ to one of the $k$ colors, it finds a copy of colorful $H$ , if one exists, within some runtime $O (r)$ . Then the expected time to find a copy of $H$ in $G$ , if one exists, is $O({\tfrac {r}{p}})$ .

Sometimes it is also desirable to use a more restricted version of colorfulness. For example, in the context of finding cycles in planar graphs, it is possible to develop an algorithm that finds well-colored cycles. Here, a cycle is well-colored if its vertices are colored by consecutive colors.

Example

An example would be finding a simple cycle of length $k$ in graph $G = (V, E)$ .

By applying random coloring method, each simple cycle has a probability of $k!/k^{k}>e^{-k}$ to become colorful, since there are $k^{k}$ ways of coloring the $k$ vertices on the cycle, among which there are $k!$ colorful occurrences. Then an algorithm (described next) can be used to find colorful cycles in the randomly colored graph $G$ in time $O(V^{\omega })$ , where $\omega$ is the matrix multiplication constant. Therefore, it takes $e^{k}\cdot O(V^{\omega })$ overall time to find a simple cycle of length $k$ in $G$ .

The colorful cycle-finding algorithm works by first finding all pairs of vertices in $V$ that are connected by a simple path of length $k - 1$ , and then checking whether the two vertices in each pair are connected. Given a coloring function $c : V \to {1, ..., k}$ to color graph $G$ , enumerate all partitions of the color set ${1, ..., k}$ into two subsets $C 1, C 2$ of size $k/2$ each. Note that $V$ can be divided into $V 1$ and $V 2$ accordingly, and let $G 1$ and $G 2$ denote the subgraphs induced by $V 1$ and $V 2$ respectively. Then, recursively find colorful paths of length $k/2-1$ in each of $G 1$ and $G 2$ . Suppose the boolean matrix $A 1$ and $A 2$ represent the connectivity of each pair of vertices in $G 1$ and $G 2$ by a colorful path, respectively, and let $B$ be the matrix describing the adjacency relations between vertices of $V 1$ and those of $V 2$ , the boolean product $A_{1}BA_{2}$ gives all pairs of vertices in $V$ that are connected by a colorful path of length $k - 1$ . Thus, the recursive relation of matrix multiplications is $t(k)\leq 2^{k}\cdot t(k/2)$ , which yields a runtime of $2^{O(k)}\cdot V^{\omega }$ . Although this algorithm finds only the end points of the colorful path, another algorithm by Alon and Naor^[4] that finds colorful paths themselves can be incorporated into it.

Derandomization

The derandomization of color-coding involves enumerating possible colorings of a graph $G$ , such that the randomness of coloring $G$ is no longer required. For the target subgraph $H$ in $G$ to be discoverable, the enumeration has to include at least one instance where the $H$ is colorful. To achieve this, enumerating a $k$ -perfect family $F$ of hash functions from ${1, ..., | V |}$ to ${1, ..., k}$ is sufficient. By definition, $F$ is $k$ -perfect if for every subset $S$ of ${1, ..., | V |}$ where $|S|=k$ , there exists a hash function $h$ in $F$ such that $h : S \to {1, ..., k}$ is perfect. In other words, there must exist a hash function in $F$ that colors any given $k$ vertices with $k$ distinct colors.

There are several approaches to construct such a $k$ -perfect hash family:

The best explicit construction is by Moni Naor, Leonard J. Schulman, and Aravind Srinivasan,^[5] where a family of size $e^{k}k^{O(\log k)}\log |V|$ can be obtained. This construction does not require the target subgraph to exist in the original subgraph finding problem.
Another explicit construction by Jeanette P. Schmidt and Alan Siegel^[6] yields a family of size $2^{O(k)}\log ^{2}|V|$ .
Another construction that appears in the original paper of Noga Alon et al.^[2] can be obtained by first building a $k$ -perfect family that maps ${1, ..., | V |}$ to ${1, ..., k 2},$ followed by building another $k$ -perfect family that maps ${1, ..., k 2}$ to ${1, ..., k}.$ In the first step, it is possible to construct such a family with $2 n log k$ random bits that are almost $2log k$ -wise independent,^[7]^[8] and the sample space needed for generating those random bits can be as small as $k^{O(1)}\log |V|$ . In the second step, it has been shown by Jeanette P. Schmidt and Alan Siegel^[6] that the size of such $k$ -perfect family can be $2^{O(k)}$ . Consequently, by composing the $k$ -perfect families from both steps, a $k$ -perfect family of size $2^{O(k)}\log |V|$ that maps from ${1, ..., | V |}$ to ${1, ..., k}$ can be obtained.

In the case of derandomizing well-coloring, where each vertex on the subgraph is colored consecutively, a $k$ -perfect family of hash functions from ${1, ..., | V |}$ to ${1, ..., k!}$ is needed. A sufficient $k$ -perfect family which maps from ${1, ..., | V |}$ to ${1, ..., k k}$ can be constructed in a way similar to the approach 3 above (the first step). In particular, it is done by using $nk log k$ random bits that are almost $k log k$ independent, and the size of the resulting $k$ -perfect family will be $k^{O(k)}\log |V|$ .

The derandomization of color-coding method can be easily parallelized, yielding efficient NC algorithms.

Applications

Recently, color-coding has attracted much attention in the field of bioinformatics. One example is the detection of signaling pathways in protein-protein interaction (PPI) networks. Another example is to discover and to count the number of motifs in PPI networks. Studying both signaling pathways and motifs allows a deeper understanding of the similarities and differences of many biological functions, processes, and structures among organisms.

Due to the huge amount of gene data that can be collected, searching for pathways or motifs can be highly time consuming. However, by exploiting the color-coding method, the motifs or signaling pathways with $k=O(\log n)$ vertices in a network $G$ with $n$ vertices can be found very efficiently in polynomial time. Thus, this enables us to explore more complex or larger structures in PPI networks.

Related Research Articles

In graph theory, an expander graph is a sparse graph that has strong connectivity properties, quantified using vertex, edge or spectral expansion. Expander constructions have spawned research in pure and applied mathematics, with several applications to complexity theory, design of robust computer networks, and the theory of error-correcting codes.

In mathematics, the probabilistic method is a nonconstructive method, primarily used in combinatorics and pioneered by Paul Erdős, for proving the existence of a prescribed kind of mathematical object. It works by showing that if one randomly chooses objects from a specified class, the probability that the result is of the prescribed kind is strictly greater than zero. Although the proof uses probability, the final conclusion is determined for certain, without any possible error.

In the mathematical field of graph theory, a bipartite graph is a graph whose vertices can be divided into two disjoint and independent sets $and, that is, every edge connects a vertex in to one in . Vertex sets and are usually called the parts of the graph. Equivalently, a bipartite graph is a graph that does not contain any odd-length cycles.$

This is a glossary of graph theory. Graph theory is the study of graphs, systems of nodes or vertices connected in pairs by lines or edges.

In graph theory, graph coloring is a special case of graph labeling; it is an assignment of labels traditionally called "colors" to elements of a graph subject to certain constraints. In its simplest form, it is a way of coloring the vertices of a graph such that no two adjacent vertices are of the same color; this is called a vertex coloring. Similarly, an edge coloring assigns a color to each edge so that no two adjacent edges are of the same color, and a face coloring of a planar graph assigns a color to each face or region so that no two faces that share a boundary have the same color.

Extremal graph theory is a branch of combinatorics, itself an area of mathematics, that lies at the intersection of extremal combinatorics and graph theory. In essence, extremal graph theory studies how global properties of a graph influence local substructure. Results in extremal graph theory deal with quantitative connections between various graph properties, both global and local, and problems in extremal graph theory can often be formulated as optimization problems: how big or small a parameter of a graph can be, given some constraints that the graph has to satisfy? A graph that is an optimal solution to such an optimization problem is called an extremal graph, and extremal graphs are important objects of study in extremal graph theory.

In graph theory, a proper edge coloring of a graph is an assignment of "colors" to the edges of the graph so that no two incident edges have the same color. For example, the figure to the right shows an edge coloring of a graph by the colors red, blue, and green. Edge colorings are one of several different types of graph coloring. The edge-coloring problem asks whether it is possible to color the edges of a given graph using at most $k$ different colors, for a given value of $k$ , or with the fewest possible colors. The minimum required number of colors for the edges of a given graph is called the chromatic index of the graph. For example, the edges of the graph in the illustration can be colored by three colors but cannot be colored by two colors, so the graph shown has chromatic index three.

In graph theory, the Hadwiger conjecture states that if $is loopless and has no minor then its chromatic number satisfies . It is known to be true for . The conjecture is a generalization of the four-color theorem and is considered to be one of the most important and challenging open problems in the field.$

In graph theory, the Hadwiger number of an undirected graph $G$ is the size of the largest complete graph that can be obtained by contracting edges of $G$ . Equivalently, the Hadwiger number $h (G)$ is the largest number $n$ for which the complete graph $K n$ is a minor of $G$ , a smaller graph obtained from $G$ by edge contractions and vertex and edge deletions. The Hadwiger number is also known as the contraction clique number of $G$ or the homomorphism degree of $G$ . It is named after Hugo Hadwiger, who introduced it in 1943 in conjunction with the Hadwiger conjecture, which states that the Hadwiger number is always at least as large as the chromatic number of $G$ .

<span class="mw-page-title-main">Maximal independent set</span> Independent set which is not a subset of any other independent set

In graph theory, a maximal independent set (MIS) or maximal stable set is an independent set that is not a subset of any other independent set. In other words, there is no vertex outside the independent set that may join it because it is maximal with respect to the independent set property.

In graph theory and graph algorithms, a feedback arc set or feedback edge set in a directed graph is a subset of the edges of the graph that contains at least one edge out of every cycle in the graph. Removing these edges from the graph breaks all of the cycles, producing an acyclic subgraph of the given graph, often called a directed acyclic graph. A feedback arc set with the fewest possible edges is a minimum feedback arc set and its removal leaves a maximum acyclic subgraph; weighted versions of these optimization problems are also used. If a feedback arc set is minimal, meaning that removing any edge from it produces a subset that is not a feedback arc set, then it has an additional property: reversing all of its edges, rather than removing them, produces a directed acyclic graph.

The five color theorem is a result from graph theory that given a plane separated into regions, such as a political map of the countries of the world, the regions may be colored using no more than five colors in such a way that no two adjacent regions receive the same color.

In the mathematical area of graph theory, a triangle-free graph is an undirected graph in which no three vertices form a triangle of edges. Triangle-free graphs may be equivalently defined as graphs with clique number ≤ 2, graphs with girth ≥ 4, graphs with no induced 3-cycle, or locally independent graphs.

In graph theory, the planar separator theorem is a form of isoperimetric inequality for planar graphs, that states that any planar graph can be split into smaller pieces by removing a small number of vertices. Specifically, the removal of ⁠ $⁠$ vertices from an $n$ -vertex graph can partition the graph into disjoint subgraphs each of which has at most ⁠ $⁠$ vertices.

Property testing is a field of theoretical computer science, concerned with the design of super-fast algorithms for approximate decision making, where the decision refers to properties or parameters of huge objects.

In the study of graph coloring problems in mathematics and computer science, a greedy coloring or sequential coloring is a coloring of the vertices of a graph formed by a greedy algorithm that considers the vertices of the graph in sequence and assigns each vertex its first available color. Greedy colorings can be found in linear time, but they do not, in general, use the minimum number of colors possible.

In graph theory, the tree-depth of a connected undirected graph $is a numerical invariant of, the minimum height of a Trémaux tree for a supergraph of . This invariant and its close relatives have gone under many different names in the literature, including vertex ranking number, ordered chromatic number, and minimum elimination tree height; it is also closely related to the cycle rank of directed graphs and the star height of regular languages. Intuitively, where the treewidth of a graph measures how far it is from being a tree, this parameter measures how far a graph is from being a star.$

In graph theory, a k-degenerate graph is an undirected graph in which every subgraph has at least one vertex of degree at most k: that is, some vertex in the subgraph touches k or fewer of the subgraph's edges. The degeneracy of a graph is the smallest value of k for which it is k-degenerate. The degeneracy of a graph is a measure of how sparse it is, and is within a constant factor of other sparsity measures such as the arboricity of a graph.

In graph theory, a branch of mathematics, the Erdős–Hajnal conjecture states that families of graphs defined by forbidden induced subgraphs have either large cliques or large independent sets. It is named for Paul Erdős and András Hajnal, who first posed it as an open problem in a paper from 1977.

DSatur is a graph colouring algorithm put forward by Daniel Brélaz in 1979. Similarly to the greedy colouring algorithm, DSatur colours the vertices of a graph one after another, adding a previously unused colour when needed. Once a new vertex has been coloured, the algorithm determines which of the remaining uncoloured vertices has the highest number of colours in its neighbourhood and colours this vertex next. Brélaz defines this number as the degree of saturation of a given vertex. The contraction of the term "degree of saturation" forms the name of the algorithm. DSatur is a heuristic graph colouring algorithm, yet produces exact results for bipartite, cycle, and wheel graphs. DSatur has also been referred to as saturation LF in the literature.

References

↑ Alon, N., Yuster, R., and Zwick, U. 1994. Color-coding: a new method for finding simple paths, cycles and other small subgraphs within large graphs. In Proceedings of the Twenty-Sixth Annual ACM Symposium on theory of Computing (Montreal, Quebec, Canada, May 23–25, 1994). STOC '94. ACM, New York, NY, 326–335. DOI= http://doi.acm.org/10.1145/195058.195179
1 2 Alon, N., Yuster, R., and Zwick, U. 1995. Color-coding. J. ACM 42, 4 (Jul. 1995), 844–856. DOI= http://doi.acm.org/10.1145/210332.210337
↑ Coppersmith–Winograd Algorithm
↑ Alon, N. and Naor, M. 1994 Derandomization, Witnesses for Boolean Matrix Multiplication and Construction of Perfect Hash Functions. Technical Report. UMI Order Number: CS94-11., Weizmann Science Press of Israel.
↑ Naor, M., Schulman, L. J., and Srinivasan, A. 1995. Splitters and near-optimal derandomization. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science (October 23–25, 1995). FOCS. IEEE Computer Society, Washington, DC, 182.
1 2 Schmidt, J. P.; Siegel, A. (1990). "The spatial complexity of oblivious k-probe Hash functions". SIAM J. Comput. 19 (5): 775–786. doi:10.1137/0219054.
↑ Naor, J. and Naor, M. 1990. Small-bias probability spaces: efficient constructions and applications. In Proceedings of the Twenty-Second Annual ACM Symposium on theory of Computing (Baltimore, Maryland, United States, May 13–17, 1990). H. Ortiz, Ed. STOC '90. ACM, New York, NY, 213-223. DOI= http://doi.acm.org/10.1145/100216.100244
↑ Alon, N., Goldreich, O., Hastad, J., and Peralta, R. 1990. Simple construction of almost k-wise independent random variables. In Proceedings of the 31st Annual Symposium on Foundations of Computer Science (October 22–24, 1990). SFCS. IEEE Computer Society, Washington, DC, 544-553 vol.2. doi : 10.1109/FSCS.1990.89575

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Alon, N., Yuster, R., and Zwick, U. 1994. Color-coding: a new method for finding simple paths, cycles and other small subgraphs within large graphs. In Proceedings of the Twenty-Sixth Annual ACM Symposium on theory of Computing (Montreal, Quebec, Canada, May 23–25, 1994). STOC '94. ACM, New York, NY, 326–335. DOI= http://doi.acm.org/10.1145/195058.195179

[orig-2] 1 2 Alon, N., Yuster, R., and Zwick, U. 1995. Color-coding. J. ACM 42, 4 (Jul. 1995), 844–856. DOI= http://doi.acm.org/10.1145/210332.210337

[3] Coppersmith–Winograd Algorithm

[4] Alon, N. and Naor, M. 1994 Derandomization, Witnesses for Boolean Matrix Multiplication and Construction of Perfect Hash Functions. Technical Report. UMI Order Number: CS94-11., Weizmann Science Press of Israel.

[5] Naor, M., Schulman, L. J., and Srinivasan, A. 1995. Splitters and near-optimal derandomization. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science (October 23–25, 1995). FOCS. IEEE Computer Society, Washington, DC, 182.

[SS90-6] 1 2 Schmidt, J. P.; Siegel, A. (1990). "The spatial complexity of oblivious k-probe Hash functions". SIAM J. Comput. 19 (5): 775–786. doi:10.1137/0219054.

[7] Naor, J. and Naor, M. 1990. Small-bias probability spaces: efficient constructions and applications. In Proceedings of the Twenty-Second Annual ACM Symposium on theory of Computing (Baltimore, Maryland, United States, May 13–17, 1990). H. Ortiz, Ed. STOC '90. ACM, New York, NY, 213-223. DOI= http://doi.acm.org/10.1145/100216.100244

[8] Alon, N., Goldreich, O., Hastad, J., and Peralta, R. 1990. Simple construction of almost k-wise independent random variables. In Proceedings of the 31st Annual Symposium on Foundations of Computer Science (October 22–24, 1990). SFCS. IEEE Computer Society, Washington, DC, 544-553 vol.2. doi : 10.1109/FSCS.1990.89575

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]