This article needs additional citations for verification .(May 2020) |
In computer science, more particularly in formal language theory, a cyclic language is a set of strings that is closed with respect to repetition, root, and cyclic shift.
If A is a set of symbols, and A * is the set of all strings built from symbols in A, then a string set L ⊆ A* is called a formal language over the alphabet A. The language L is called cyclic if
where wn denotes the n-fold repetition of the string w, and vw denotes the concatenation of the strings v and w. [1] : Def.1
For example, using the alphabet A = {a, b }, the language
L = | { | apbn1 | an2bn2 | ... | ankbnk | aq | : | ni ≥ 0 and p+q = n1 } | |
∪ | { | bp | an1bn1 | an2bn2 | ... | ankbq | : | ni ≥ 0 and p+q = nk } |
is cyclic, but not regular. [1] : Exm.2 However, L is context-free, since M = { an1bn1an2bn2 ... ankbnk : ni ≥ 0 } is, and context-free languages are closed under circular shift; L is obtained as circular shift of M.
In formal language theory, a context-sensitive language is a language that can be defined by a context-sensitive grammar. Context-sensitive is known as type-1 in the Chomsky hierarchy of formal languages.
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the form
In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules called a formal grammar.
In mathematical logic and computer science, the Kleene star is a unary operation, either on sets of strings or on sets of symbols or characters. In mathematics, it is more commonly known as the free monoid construction. The application of the Kleene star to a set is written as . It is widely used for regular expressions, which is the context in which it was introduced by Stephen Kleene to characterize certain automata, where it means "zero or more repetitions".
In theoretical computer science and formal language theory, a regular language is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science.
In mathematics, logic and computer science, a formal language is called recursively enumerable if it is a recursively enumerable subset in the set of all possible words over the alphabet of the language, i.e., if there exists a Turing machine which will enumerate all valid strings of the language.
In computational linguistics and computer science, edit distance is a string metric, i.e. a way of quantifying how dissimilar two strings are to one another, that is measured by counting the minimum number of operations required to transform one string into the other. Edit distances find applications in natural language processing, where automatic spelling correction can determine candidate corrections for a misspelled word by selecting words from a dictionary that have a low distance to the word in question. In bioinformatics, it can be used to quantify the similarity of DNA sequences, which can be viewed as strings of the letters A, C, G and T.
In abstract algebra, the free monoid on a set is the monoid whose elements are all the finite sequences of zero or more elements from that set, with string concatenation as the monoid operation and with the unique sequence of zero elements, often called the empty string and denoted by ε or λ, as the identity element. The free monoid on a set A is usually denoted A∗. The free semigroup on A is the subsemigroup of A∗ containing all elements except the empty string. It is usually denoted A+.
In combinatorial mathematics, a circular shift is the operation of rearranging the entries in a tuple, either by moving the final entry to the first position, while shifting all other entries to the next position, or by performing the inverse operation. A circular shift is a special kind of cyclic permutation, which in turn is a special kind of permutation. Formally, a circular shift is a permutation σ of the n entries in the tuple such that either
Conjunctive grammars are a class of formal grammars studied in formal language theory. They extend the basic type of grammars, the context-free grammars, with a conjunction operation. Besides explicit conjunction, conjunctive grammars allow implicit disjunction represented by multiple rules for a single nonterminal symbol, which is the only logical connective expressible in context-free grammars. Conjunction can be used, in particular, to specify intersection of languages. A further extension of conjunctive grammars known as Boolean grammars additionally allows explicit negation.
In automata theory, a deterministic pushdown automaton is a variation of the pushdown automaton. The class of deterministic pushdown automata accepts the deterministic context-free languages, a proper subset of context-free languages.
In formal language theory, an alphabet, sometimes called a vocabulary, is a non-empty set of indivisible symbols/characters/glyphs, typically thought of as representing letters, characters, digits, phonemes, or even words. Alphabets in this technical sense of a set are used in a diverse range of fields including logic, mathematics, computer science, and linguistics. An alphabet may have any cardinality ("size") and, depending on its purpose, may be finite, countable, or even uncountable.
In computer science, in the area of formal language theory, frequent use is made of a variety of string functions; however, the notation used is different from that used for computer programming, and some commonly used functions in the theoretical realm are rarely used when programming. This article defines some of these basic terms.
Indexed grammars are a generalization of context-free grammars in that nonterminals are equipped with lists of flags, or index symbols. The language produced by an indexed grammar is called an indexed language.
A production or production rule in computer science is a rewrite rule specifying a symbol substitution that can be recursively performed to generate new symbol sequences. A finite set of productions is the main component in the specification of a formal grammar. The other components are a finite set of nonterminal symbols, a finite set of terminal symbols that is disjoint from and a distinguished symbol that is the start symbol.
A formal grammar describes which strings from an alphabet of a formal language are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. A formal grammar is defined as a set of production rules for such strings in a formal language.
In mathematics, logic and computer science, a formal language is called recursive if it is a recursive subset of the set of all possible finite sequences over the alphabet of the language. Equivalently, a formal language is recursive if there exists a Turing machine that, when given a finite sequence of symbols as input, always halts and accepts it if it belongs to the language and halts and rejects it otherwise. In Theoretical computer science, such always-halting Turing machines are called total Turing machines or algorithms. Recursive languages are also called decidable.
In computer science, more specifically in automata and formal language theory, nested words are a concept proposed by Alur and Madhusudan as a joint generalization of words, as traditionally used for modelling linearly ordered structures, and of ordered unranked trees, as traditionally used for modelling hierarchical structures. Finite-state acceptors for nested words, so-called nested word automata, then give a more expressive generalization of finite automata on words. The linear encodings of languages accepted by finite nested word automata gives the class of visibly pushdown languages. The latter language class lies properly between the regular languages and the deterministic context-free languages. Since their introduction in 2004, these concepts have triggered much research in that area.
In theoretical computer science, in particular in formal language theory, Greibach's theorem states that certain properties of formal language classes are undecidable. It is named after the computer scientist Sheila Greibach, who first proved it in 1963.
In theoretical computer science, in particular in formal language theory, the Brzozowski derivative of a set of strings and a string is the set of all strings obtainable from a string in by cutting off the prefix . Formally: