Monotone dualization

Last updated

In theoretical computer science, monotone dualization is a computational problem of constructing the dual of a monotone Boolean function. Equivalent problems can also be formulated as constructing the transversal hypergraph of a given hypergraph, of listing all minimal hitting sets of a family of sets, or of listing all minimal set covers of a family of sets. These problems can be solved in quasi-polynomial time in the combined size of its input and output, but whether they can be solved in polynomial time is an open problem.

Contents

Definitions

A Boolean function takes as input an assignment of truth values to its arguments, and produces as output another truth value. It is monotone when changing an argument from false to true cannot change the output from true to false. Every monotone Boolean function can be expressed as a Boolean expression using only logical disjunction ("or") and logical conjunction ("and"), without using logical negation ("not"). Such an expression is called a monotone Boolean expression. Every monotone Boolean expression describes a monotone Boolean function. [1]

There may be many different expressions for the same function. Among them are two special expressions, the conjunctive normal form and disjunctive normal form. For monotone functions these two special forms can also be restricted to be monotone: [1]

The dual of a Boolean function is obtained by negating all of its variables, applying the function, and then negating the result. The dual of the dual of any Boolean function is the original function. The dual of a monotone function is monotone. If one is given a monotone Boolean expression, then replacing all conjunctions by disjunctions produces another monotone Boolean expression for the dual function, following De Morgan's laws. However, this will transform the conjunctive normal form into disjunctive normal form, and vice versa, which may be undesired. Monotone dualization is the problem of finding an expression for the dual function without changing the form of the expression, or equivalently of converting a function in one normal form into the dual form. [1]

As a functional problem, monotone dualization can be expressed in the following equivalent ways: [1] [2]

Another version of the problem can be formulated as a problem of "exact learning" in computational learning theory: given access to a subroutine for evaluating a monotone Boolean function, reconstruct both the CNF and DNF representations of the function, using a small number of function evaluations. However, it is crucial in analyzing the complexity of this problem that both the CNF and DNF representations are output. If only the CNF representation of an unknown monotone function is output, it follows from information theory that the number of function evaluations must sometimes be exponential in the combined input and output sizes. This is because (to be sure of getting the correct answer) the algorithm must evaluate the function at least once for each prime implicate and at least once for each prime implicant, but this number of evaluations can be exponentially larger than the number of prime implicates alone. [1]

It is also possible to express a variant of the monotone dualization problem as a decision problem, with a Boolean answer: [1]

Unsolved problem in computer science:

Is it possible to test whether two prime CNF expressions represent dual functions in polynomial time?

It is an open problem whether monotone dualization has a polynomial time algorithm (in any of these equivalent forms). The fastest algorithms known run in quasi-polynomial time. [1] The size of the output of the dualization and exact learning problems can be exponentially large, as a function of the number of variables or the input size. For instance, an -vertex graph consisting of disjoint triangles has hyperedges in its transversal hypergraph. [3] Therefore, what is desired for these problems is an output-sensitive algorithm, one that takes a small amount of time per output clause. The decision, dualization, and exact learning formulations of the problem are all computationally equivalent, in the following sense: any one of them can be solved using a subroutine for any other of these problems, with a number of subroutine calls that is polynomial in the combined input and output sizes of the problems. [4] Therefore, if any one of these problems could be solved in polynomial time, they all could. However, the best time bound that is known for these problems is quasi-polynomial time. It remains an open problem whether they can be solved in polynomial time. [1]

Computational complexity

Equivalence of decision, enumeration, and exact learning

The problem of finding the prime CNF expression for the dual function of a monotone function, given as a CNF formula, can be solved by finding the DNF expression for the given function and then dualizing it. Therefore, finding the dual CNF expression, and finding the DNF expression for the (primal) given function, have the same complexity. This problem can also be seen as a special case of the exact learning formulation of the problem. From a given CNF expression, it is straightforward to evaluate the function that it expresses. An exact learning algorithm will return both the starting CNF expression and the desired DNF expression. Therefore, dualization can be no harder than exact learning. [5]

It is also straightforward to solve the decision problem given an algorithm for dualization: dualize the given CNF expression and then test whether it is equal to the given DNF expression. Therefore, research in this area has focused on the other direction of this equivalence: solving the exact learning problem (or the dualization problem) given a subroutine for the decision problem. [4]

Bioch & Ibaraki (1995) outline the following algorithm for solving exact learning using a decision subroutine:

Each iteration through the outer loop of the algorithm uses a linear number of calls to the decision problem to find the unforced truth assignment, uses a linear number of function evaluations to find a minimal true or maximal false function value, and adds one clause to the output. Therefore, the total number of calls to the decision problem and the total number of function evaluations is a polynomial of the total output size. [4]

Quasi-polynomial time

A central result in the study of this problem, by Michael Fredman and Leonid Khachiyan, is that monotone dualization (in any of its equivalent forms) can be solved in quasi-polynomial time. [1] [6] Their algorithms directly solve the decision problem, but can be converted to the other forms of the monotone dualization problem as described in § Equivalence of different formulations. Alternatively, in cases where the answer to the decision problem is no, the algorithms can be modified to return a witness, that is, a truth assignment for which the input formulas fail to determine the function value. Its main idea is to first "clean" the decision problem instance, by removing redundant information and directly solving certain easy-to-solve cases of the problem. Then, in remaining cases it branches on a carefully chosen variable. This means recursively calling the same algorithm on two smaller subproblems, one for a restricted monotone function for which the variable has been set to true and the other in which the variable has been set to false. The cleaning step ensures the existence of a variable that belongs to many clauses, causing a significant reduction in the recursive subproblem size. [1]

In more detail, the first and slower of the two algorithms of Fredman and Khachiyan performs the following steps: [1] [6]

When this algorithm branches on a variable occurring in many clauses, these clauses are eliminated from one of the two recursive calls. Using this fact, the running time of the algorithm can be bounded by an exponential function of . [1] [6]

A second algorithm of Fredman and Khachiyan has a similar overall structure, but in the case where the branch variable occurs in many clauses of one set and few of the other, it chooses the first of the two recursive calls to be the one where setting the branch variable significantly reduces the number of clauses. If that recursive call fails to find an inconsistency, then, instead of performing a single recursive call for the other branch, it performs one call for each clause that contains the branch variable, on a restricted subproblem in which all the other variables of that clause have been assigned in the same way. Its running time is an exponential function of . [1] [6]

Polynomial special cases

Many special cases of the monotone dualization problem have been shown to be solvable in polynomial time through the analysis of their parameterized complexity. [2] These include:

Applications

One application of monotone dualization involves group testing for fault detection and isolation in the model-based diagnosis of complex systems. From a collection of observations of faulty behavior of a system, each with some set of active components, one can surmise that the faulty components causing this misbehavior are likely to form a minimal hitting set of this family of sets. [2] [12]

In biochemical engineering, the enumeration of hitting sets has been used to identify subsets of metabolic reactions whose removal from a system adjusts the balance of the system in a desired direction. [2] [13] [14] Analogous methods have also been applied to other biological interaction networks, for instance in the design of microarray experiments that can be used to infer protein interactions in biological systems. [2]

In recreational mathematics, in the design of sudoku puzzles, the problem of designing a system of clues that has a given grid of numbers as its unique solution can be formulated as a minimal hitting set problem. The 81 candidate clues from the given grid are the elements to be selected in the hitting set, and the sets to be hit are the sets of candidate clues that can eliminate each alternative solution. Thus, the enumeration of minimal hitting sets can be used to find all systems of clues that have a given solution. This approach has been as part of a computational proof that it is not possible to design a valid sudoku puzzle with only 16 clues. [2] [15]

Related Research Articles

In logic and computer science, the Boolean satisfiability problem (sometimes called propositional satisfiability problem and abbreviated SATISFIABILITY, SAT or B-SAT) is the problem of determining if there exists an interpretation that satisfies a given Boolean formula. In other words, it asks whether the variables of a given Boolean formula can be consistently replaced by the values TRUE or FALSE in such a way that the formula evaluates to TRUE. If this is the case, the formula is called satisfiable. On the other hand, if no such assignment exists, the function expressed by the formula is FALSE for all possible variable assignments and the formula is unsatisfiable. For example, the formula "a AND NOT b" is satisfiable because one can find the values a = TRUE and b = FALSE, which make (a AND NOT b) = TRUE. In contrast, "a AND NOT a" is unsatisfiable.

In computational complexity theory, the complexity class #P (pronounced "sharp P" or, sometimes "number P" or "hash P") is the set of the counting problems associated with the decision problems in the set NP. More formally, #P is the class of function problems of the form "compute f(x)", where f is the number of accepting paths of a nondeterministic Turing machine running in polynomial time. Unlike most well-known complexity classes, it is not a class of decision problems but a class of function problems. The most difficult, representative problems of this class are #P-complete.

In Boolean logic, a formula is in conjunctive normal form (CNF) or clausal normal form if it is a conjunction of one or more clauses, where a clause is a disjunction of literals; otherwise put, it is a product of sums or an AND of ORs. As a canonical normal form, it is useful in automated theorem proving and circuit theory.

In computer science, 2-satisfiability, 2-SAT or just 2SAT is a computational problem of assigning values to variables, each of which has two possible values, in order to satisfy a system of constraints on pairs of variables. It is a special case of the general Boolean satisfiability problem, which can involve constraints on more than two variables, and of constraint satisfaction problems, which can allow more than two choices for the value of each variable. But in contrast to those more general problems, which are NP-complete, 2-satisfiability can be solved in polynomial time.

In computational complexity theory, the Cook–Levin theorem, also known as Cook's theorem, states that the Boolean satisfiability problem is NP-complete. That is, it is in NP, and any problem in NP can be reduced in polynomial time by a deterministic Turing machine to the Boolean satisfiability problem.

In computational complexity theory, a function problem is a computational problem where a single output is expected for every input, but the output is more complex than that of a decision problem. For function problems, the output is not simply 'yes' or 'no'.

<span class="mw-page-title-main">Boolean function</span> Function returning one of only two values

In mathematics, a Boolean function is a function whose arguments and result assume values from a two-element set. Alternative names are switching function, used especially in older computer science literature, and truth function, used in logic. Boolean functions are the subject of Boolean algebra and switching theory.

In formal logic, Horn-satisfiability, or HORNSAT, is the problem of deciding whether a given set of propositional Horn clauses is satisfiable or not. Horn-satisfiability and Horn clauses are named after Alfred Horn.

In computational complexity theory, the maximum satisfiability problem (MAX-SAT) is the problem of determining the maximum number of clauses, of a given Boolean formula in conjunctive normal form, that can be made true by an assignment of truth values to the variables of the formula. It is a generalization of the Boolean satisfiability problem, which asks whether there exists a truth assignment that makes all clauses true.

In computational complexity, problems that are in the complexity class NP but are neither in the class P nor NP-complete are called NP-intermediate, and the class of such problems is called NPI. Ladner's theorem, shown in 1975 by Richard E. Ladner, is a result asserting that, if P ≠ NP, then NPI is not empty; that is, NP contains problems that are neither in P nor NP-complete. Since it is also true that if NPI problems exist, then P ≠ NP, it follows that P = NP if and only if NPI is empty.

In constraint satisfaction, a decomposition method translates a constraint satisfaction problem into another constraint satisfaction problem that is binary and acyclic. Decomposition methods work by grouping variables into sets, and solving a subproblem for each set. These translations are done because solving binary acyclic problems is a tractable problem.

In computer science, a holographic algorithm is an algorithm that uses a holographic reduction. A holographic reduction is a constant-time reduction that maps solution fragments many-to-many such that the sum of the solution fragments remains unchanged. These concepts were introduced by Leslie Valiant, who called them holographic because "their effect can be viewed as that of producing interference patterns among the solution fragments". The algorithms are unrelated to laser holography, except metaphorically. Their power comes from the mutual cancellation of many contributions to a sum, analogous to the interference patterns in a hologram.

MAXEkSAT is a problem in computational complexity theory that is a maximization version of the Boolean satisfiability problem 3SAT. In MAXEkSAT, each clause has exactly k literals, each with distinct variables, and is in conjunctive normal form. These are called k-CNF formulas. The problem is to determine the maximum number of clauses that can be satisfied by a truth assignment to the variables in the clauses.

In computer science, the Sharp Satisfiability Problem is the problem of counting the number of interpretations that satisfy a given Boolean formula, introduced by Valiant in 1979. In other words, it asks in how many ways the variables of a given Boolean formula can be consistently replaced by the values TRUE or FALSE in such a way that the formula evaluates to TRUE. For example, the formula is satisfiable by three distinct boolean value assignments of the variables, namely, for any of the assignments, , and, we have

In computational complexity theory, the exponential time hypothesis is an unproven computational hardness assumption that was formulated by Impagliazzo & Paturi (1999). It states that satisfiability of 3-CNF Boolean formulas cannot be solved in subexponential time, . More precisely, the usual form of the hypothesis asserts the existence of a number such that all algorithms that correctly solve this problem require time at least . The exponential time hypothesis, if true, would imply that P ≠ NP, but it is a stronger statement. It implies that many computational problems are equivalent in complexity, in the sense that if one of them has a subexponential time algorithm then they all do, and that many known algorithms for these problems have optimal or near-optimal time complexity.

In computational complexity theory, a branch of computer science, the Max/min CSP/Ones classification theorems state necessary and sufficient conditions that determine the complexity classes of problems about satisfying a subset S of boolean relations. They are similar to Schaefer's dichotomy theorem, which classifies the complexity of satisfying finite sets of relations; however, the Max/min CSP/Ones classification theorems give information about the complexity of approximating an optimal solution to a problem defined by S.

The Boolean satisfiability problem can be stated formally as: given a Boolean expression with variables, finding an assignment of the variables such that is true. It is seen as the canonical NP-complete problem. While no efficient algorithm is known to solve this problem in the general case, there are certain heuristics, informally called 'rules of thumb' in programming, that can usually help solve the problem reasonably efficiently.

In mathematics, a read-once function is a special type of Boolean function that can be described by a Boolean expression in which each variable appears only once.

In computational complexity, not-all-equal 3-satisfiability (NAE3SAT) is an NP-complete variant of the Boolean satisfiability problem, often used in proofs of NP-completeness.

In computer science, an enumeration algorithm is an algorithm that enumerates the answers to a computational problem. Formally, such an algorithm applies to problems that take an input and produce a list of solutions, similarly to function problems. For each input, the enumeration algorithm must produce the list of all solutions, without duplicates, and then halt. The performance of an enumeration algorithm is measured in terms of the time required to produce the solutions, either in terms of the total time required to produce all solutions, or in terms of the maximal delay between two consecutive solutions and in terms of a preprocessing time, counted as the time before outputting the first solution. This complexity can be expressed in terms of the size of the input, the size of each individual output, or the total size of the set of all outputs, similarly to what is done with output-sensitive algorithms.

References

  1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Eiter, Thomas; Makino, Kazuhisa; Gottlob, Georg (2008), "Computational aspects of monotone dualization: a brief survey", Discrete Applied Mathematics , 156 (11): 2035–2049, doi: 10.1016/j.dam.2007.04.017 , MR   2437000
  2. 1 2 3 4 5 6 7 8 Gainer-Dewar, Andrew; Vera-Licona, Paola (2017), "The minimal hitting set generation problem: algorithms and computation", SIAM Journal on Discrete Mathematics , 31 (1): 63–100, arXiv: 1601.02939 , doi:10.1137/15M1055024, MR   3590650
  3. Moon, J. W.; Moser, L. (1965), "On cliques in graphs", Israel Journal of Mathematics , 3: 23–28, doi:10.1007/BF02760024, MR   0182577, S2CID   9855414
  4. 1 2 3 Bioch, Jan C.; Ibaraki, Toshihide (1995), "Complexity of identification and dualization of positive Boolean functions", Information and Computation , 123 (1): 50–63, doi: 10.1006/inco.1995.1157 , hdl: 1765/55247 , MR   1358967
  5. Gurvich, V.; Khachiyan, L. (1999), "On generating the irredundant conjunctive and disjunctive normal forms of monotone Boolean functions", Discrete Applied Mathematics , 96/97: 363–373, doi:10.1016/S0166-218X(99)00099-2, MR   1724731
  6. 1 2 3 4 Fredman, Michael L.; Khachiyan, Leonid (1996), "On the complexity of dualization of monotone disjunctive normal forms", Journal of Algorithms, 21 (3): 618–628, doi:10.1006/jagm.1996.0062, MR   1417667
  7. Domingo, Carlos; Mishra, Nina; Pitt, Leonard (1999), "Efficient Read-Restricted Monotone CNF/DNF dualization by learning with membership queries", Machine Learning, 37 (1): 89–110, doi: 10.1023/a:1007627028578
  8. Mishra, Nina; Pitt, Leonard (1997), "Generating all maximal independent sets of bounded-degree hypergraphs", in Freund, Yoav; Schapire, Robert E. (eds.), Proceedings of the Tenth Annual Conference on Computational Learning Theory, COLT 1997, Nashville, Tennessee, USA, July 6-9, 1997, Association for Computing Machinery, pp. 211–217, doi:10.1145/267460.267500
  9. Khachiyan, Leonid; Boros, Endre; Gurvich, Vladimir; Elbassioni, Khaled (2007), "Computing many maximal independent sets for hypergraphs in parallel", Parallel Processing Letters , 17 (2): 141–152, doi:10.1142/S0129626407002934, MR   2334718
  10. Eiter, Thomas; Gottlob, Georg; Makino, Kazuhisa (2003), "New results on monotone dualization and generating hypergraph transversals", SIAM Journal on Computing , 32 (2): 514–537, arXiv: cs/0204009 , doi:10.1137/S009753970240639X, MR   1969402
  11. Elbassioni, Khaled M.; Hagen, Matthias; Rauf, Imran (2008), "Some fixed-parameter tractable classes of hypergraph duality and related problems", in Grohe, Martin; Niedermeier, Rolf (eds.), Parameterized and Exact Computation, Third International Workshop, IWPEC 2008, Victoria, Canada, May 14-16, 2008. Proceedings, Lecture Notes in Computer Science, vol. 5018, Springer, pp. 91–102, doi:10.1007/978-3-540-79723-4_10
  12. Reiter, Raymond (April 1987), "A theory of diagnosis from first principles", Artificial Intelligence , 32 (1): 57–95, doi:10.1016/0004-3702(87)90062-2
  13. Klamt, Steffen; Gilles, Ernst Dieter (January 2004), "Minimal cut sets in biochemical reaction networks", Bioinformatics, 20 (2): 226–234, doi: 10.1093/bioinformatics/btg395
  14. Haus, Utz-Uwe; Klamt, Steffen; Stephen, Tamon (April 2008), "Computing knock-out strategies in metabolic networks", Journal of Computational Biology, 15 (3): 259–268, arXiv: 0801.0082 , doi:10.1089/cmb.2007.0229
  15. McGuire, Gary; Tugemann, Bastian; Civario, Gilles (2014), "There is no 16-clue Sudoku: solving the Sudoku minimum number of clues problem via hitting set enumeration", Experimental Mathematics, 23 (2): 190–217, arXiv: 1201.0749 , doi:10.1080/10586458.2013.870056, MR   3223774