Smallest grammar problem

Last updated

In data compression and the theory of formal languages, the smallest grammar problem is the problem of finding the smallest context-free grammar that generates a given string of characters (but no other string). The size of a grammar is defined by some authors as the number of symbols on the right side of the production rules. [1] Others also add the number of rules to that. [2] The (decision version of the) problem is NP-complete. [1] The smallest context-free grammar that generates a given string is always a straight-line grammar without useless rules.[ citation needed ]

In signal processing, data compression, source coding, or bit-rate reduction involves encoding information using fewer bits than the original representation. Compression can be either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information.

Formal language set of strings of symbols that may be constrained by rules that are specific to it

In mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.

In formal language theory, a context-free grammar (CFG) is a certain type of formal grammar: a set of production rules that describe all possible strings in a given formal language. Production rules are simple replacements. For example, the rule

See also

Grammar-based codes or Grammar-based compression are compression algorithms based on the idea of constructing a context-free grammar (CFG) for the string to be compressed. Examples include universal lossless data compression algorithms. To compress a data sequence , a grammar-based code transforms into a context-free grammar . The problem of finding a smallest grammar for an input sequence is known to be NP-hard, so many grammar-transform algorithms are proposed from theoretical and practical viewpoints. Generally, the produced grammar is further compressed by statistical encoders like arithmetic coding.

A straight-line grammar is a formal grammar that generates exactly one string. Consequently, it does not branch nor loop.

Related Research Articles

Kolmogorov complexity Measure for an object, of the length of the shortest computer program that produces the object as output

In algorithmic information theory, the Kolmogorov complexity of an object, such as a piece of text, is the length of the shortest computer program that produces the object as output. It is a measure of the computational resources needed to specify the object, and is also known as algorithmic complexity, Solomonoff–Kolmogorov–Chaitin complexity, program-size complexity, descriptive complexity, or algorithmic entropy. It is named after Andrey Kolmogorov, who first published on the subject in 1963.

In computational complexity theory, P, also known as PTIME or DTIME(nO ), is a fundamental complexity class. It contains all decision problems that can be solved by a deterministic Turing machine using a polynomial amount of computation time, or polynomial time.

Algorithmic information theory (AIT) is a merger of information theory and computer science that concerns itself with the relationship between computation and information of computably generated objects, such as strings or any other data structure. According to Gregory Chaitin, it is "the result of putting Shannon's information theory and Turing's computability theory into a cocktail shaker and shaking vigorously."

In computational complexity theory, the unique games conjecture is a conjecture made by Subhash Khot in 2002. The conjecture postulates that the problem of determining the approximate value of a certain type of game, known as a unique game, has NP-hard algorithmic complexity. It has broad applications in the theory of hardness of approximation. If it is true, then for many important problems it is not only impossible to get an exact solution in polynomial time, but also impossible to get a good polynomial-time approximation. The problems for which such an inapproximability result would hold include constraint satisfaction problems, which crop up in a wide variety of disciplines.

Active networking is a communication pattern that allows packets flowing through a telecommunications network to dynamically modify the operation of the network.

In cryptography the standard model is the model of computation in which the adversary is only limited by the amount of time and computational power available. Other names used are bare model and plain model.

In computational complexity theory, a computational hardness assumption is the hypothesis that a particular problem cannot be solved efficiently. It is not known how to prove (unconditional) hardness for essentially any useful problem. Instead, computer scientists rely on reductions to formally relate the hardness of a new or complicated problem to a computational hardness assumption about a problem that is better-understood.

Michael John Fischer is a computer scientist who works in the fields of distributed computing, parallel computing, cryptography, algorithms and data structures, and computational complexity.

Salil Vadhan American computer scientist

Salil Vadhan is Vicky Joseph Professor of Computer Science and Applied Mathematics at Harvard University. After completing his undergraduate degree in Mathematics and Computer Science at Harvard in 1995, he obtained his PhD in Applied Mathematics from Massachusetts Institute of Technology in 1999, where his advisor was Shafi Goldwasser. His research centers around the interface between computational complexity theory and cryptography. He focuses on the topics of pseudorandomness and zero-knowledge proofs. His work on zig-zag product, with Omer Reingold and Avi Wigderson, was awarded the 2009 Gödel Prize.

Dense subgraph

In computer science the notion of highly connected subgraphs appears frequently. This notion can be formalized as follows. Let be an undirected graph and let be a subgraph of . Then the density of is defined to be .

Normalized compression distance (NCD) is a way of measuring the similarity between two objects, be it two documents, two letters, two emails, two music scores, two languages, two programs, two pictures, two systems, two genomes, to name a few. Such a measurement should not be application dependent or arbitrary. A reasonable definition for the similarity between two objects is how difficult it is to transform them into each other.

Amit Sahai American cryptographer

Amit Sahai is an American computer scientist. He is a professor of computer science at the UCLA and the director of the Center for Encrypted Functionalities at UCLA.

In mathematics and theoretical computer science, entropy compression is an information theoretic method for proving that a random process terminates, originally used by Robin Moser to prove an algorithmic version of the Lovász local lemma.

In mathematics, the incompressibility method is a proof method like the probabilistic method, the counting method or the pigeonhole principle. To prove that an object in a certain class satisfies a certain property, select an object of that class which is incompressible. If it does not satisfy the property, it can be compressed by computable coding. Since it can be generally proven that almost all objects in a given class are incompressible, the argument demonstrates that almost all objects in the class have the property involved. To select an incompressible object is ineffective, and cannot be done by a computer program. However, a simple counting argument usually shows that almost all objects of a given class can be compressed by only a few bits.

In formal language theory, a leftist grammar is a formal grammar on which certain restrictions are made on the left and right sides of the grammar's productions. Only two types of productions are allowed, namely those of the form and . Here, and are terminal symbols. This type of grammar was motivated by accessibility problems in the field computer security.

In communication complexity, the gap-Hamming problem asks, if Alice and Bob are each given a string, what is the minimal number of bits that they need to exchange in order for Alice to approximately compute the Hamming distance between their strings. The solution to the problem roughly states that, if Alice and Bob are each given a string, then any communication protocol used to compute the Hamming distance between their strings does (asymptotically) no better than Bob sending his whole string to Alice. More specifically, if Alice and Bob are each given -bit strings, there exists no communication protocol that lets Alice compute the hamming distance between their strings to within using less than bits.

References

  1. 1 2 Charikar, Moses; Lehman, Eric; Liu, Ding; Panigrahy, Rina; Prabhakaran, Manoj; Sahai, Amit; Shelat, Abhi (2005). "The Smallest Grammar Problem" (PDF). IEEE Transactions on Information Theory. 51 (7): 2554–2576. CiteSeerX   10.1.1.185.2130 . doi:10.1109/TIT.2005.850116. Zbl   1296.68086.
  2. Florian Benz and Timo Kötzing, “An effective heuristic for the smallest grammar problem,” Proceedings of the fifteenth annual conference on Genetic and evolutionary computation conference - GECCO ’13, 2013. ISBN   978-1-4503-1963-8 doi : 10.1145/2463372.2463441
Digital object identifier Character string used as a permanent identifier for a digital object, in a format controlled by the International DOI Foundation

In computing, a digital object identifier (DOI) is a persistent identifier or handle used to identify objects uniquely, standardized by the International Organization for Standardization (ISO). An implementation of the Handle System, DOIs are in wide use mainly to identify academic, professional, and government information, such as journal articles, research reports and data sets, and official publications though they also have been used to identify other types of information resources, such as commercial videos.

International Standard Book Number Unique numeric book identifier

The International Standard Book Number (ISBN) is a numeric commercial book identifier which is intended to be unique. Publishers purchase ISBNs from an affiliate of the International ISBN Agency.