Kleene star

Last updated

In mathematical logic and computer science, the Kleene star (or Kleene operator or Kleene closure) is a unary operation, either on sets of strings or on sets of symbols or characters. In mathematics, it is more commonly known as the free monoid construction. The application of the Kleene star to a set is written as . It is widely used for regular expressions, which is the context in which it was introduced by Stephen Kleene to characterize certain automata, where it means "zero or more repetitions".

Contents

  1. If is a set of strings, then is defined as the smallest superset of that contains the empty string and is closed under the string concatenation operation.
  2. If is a set of symbols or characters, then is the set of all strings over symbols in , including the empty string .

The set can also be described as the set containing the empty string and all finite-length strings that can be generated by concatenating arbitrary elements of , allowing the use of the same element multiple times. If is either the empty set ∅ or the singleton set , then ; if is any other finite set or countably infinite set, then is a countably infinite set. [1] As a consequence, each formal language over a finite or countably infinite alphabet is countable, since it is a subset of the countably infinite set .

The operators are used in rewrite rules for generative grammars.

Definition and notation

Given a set , define

(the language consisting only of the empty string),
,

and define recursively the set

for each .

If is a formal language, then , the -th power of the set , is a shorthand for the concatenation of set with itself times. That is, can be understood to be the set of all strings that can be represented as the concatenation of strings in .

The definition of Kleene star on is [2]

This means that the Kleene star operator is an idempotent unary operator: for any set of strings or characters, as for every .

Kleene plus

In some formal language studies, (e.g. AFL theory) a variation on the Kleene star operation called the Kleene plus is used. The Kleene plus omits the term in the above union. In other words, the Kleene plus on is

or

[3]

Examples

Example of Kleene star applied to set of strings:

{"ab","c"}* = { ε, "ab", "c", "abab", "abc", "cab", "cc", "ababab", "ababc", "abcab", "abcc", "cabab", "cabc", "ccab", "ccc", ...}.

Example of Kleene plus applied to set of characters:

{"a", "b", "c"}+ = { "a", "b", "c", "aa", "ab", "ac", "ba", "bb", "bc", "ca", "cb", "cc", "aaa", "aab", ...}.

Kleene star applied to the same character set:

{"a", "b", "c"}* = { ε, "a", "b", "c", "aa", "ab", "ac", "ba", "bb", "bc", "ca", "cb", "cc", "aaa", "aab", ...}.

Example of Kleene star applied to the empty set:

* = {ε}.

Example of Kleene plus applied to the empty set:

+ = ∅ ∅* = { } = ∅,

where concatenation is an associative and noncommutative product.

Example of Kleene plus and Kleene star applied to the singleton set containing the empty string:

If , then also for each , hence .

Generalization

Strings form a monoid with concatenation as the binary operation and ε the identity element. The Kleene star is defined for any monoid, not just strings. More precisely, let (M, ⋅) be a monoid, and SM. Then S* is the smallest submonoid of M containing S; that is, S* contains the neutral element of M, the set S, and is such that if x,yS*, then xyS*.

Furthermore, the Kleene star is generalized by including the *-operation (and the union) in the algebraic structure itself by the notion of complete star semiring. [4]

See also

Related Research Articles

<span class="mw-page-title-main">Formal language</span> Sequence of words formed by specific rules

In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.

<span class="mw-page-title-main">Monoid</span> Algebraic structure with an associative operation and an identity element

In abstract algebra, a branch of mathematics, a monoid is a set equipped with an associative binary operation and an identity element. For example, the nonnegative integers with addition form a monoid, the identity element being 0.

In theoretical computer science and formal language theory, a regular language is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science.

<span class="mw-page-title-main">String (computer science)</span> Sequence of characters, data type

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed. A string is generally considered as a data type and is often implemented as an array data structure of bytes that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence data types and structures.

<span class="mw-page-title-main">Automata theory</span> Study of abstract machines and automata

Automata theory is the study of abstract machines and automata, as well as the computational problems that can be solved using them. It is a theory in theoretical computer science. The word automata comes from the Greek word αὐτόματος, which means "self-acting, self-willed, self-moving". An automaton is an abstract self-propelled computing device which follows a predetermined sequence of operations automatically. An automaton with a finite number of states is called a Finite Automaton (FA) or Finite-State Machine (FSM). The figure on the right illustrates a finite-state machine, which is a well-known type of automaton. This automaton consists of states and transitions. As the automaton sees a symbol of input, it makes a transition to another state, according to its transition function, which takes the previous state and current input symbol as its arguments.

In abstract algebra, the free monoid on a set is the monoid whose elements are all the finite sequences of zero or more elements from that set, with string concatenation as the monoid operation and with the unique sequence of zero elements, often called the empty string and denoted by ε or λ, as the identity element. The free monoid on a set A is usually denoted A. The free semigroup on A is the subsemigroup of A containing all elements except the empty string. It is usually denoted A+.

In mathematics, the lexicographic or lexicographical order is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of a totally ordered set.

In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if

In theoretical computer science, more precisely in the theory of formal languages, the star height is a measure for the structural complexity of regular expressions and regular languages. The star height of a regular expression equals the maximum nesting depth of stars appearing in that expression. The star height of a regular language is the least star height of any regular expression for that language. The concept of star height was first defined and studied by Eggan (1963).

In theoretical computer science and mathematical logic a string rewriting system (SRS), historically called a semi-Thue system, is a rewriting system over strings from a alphabet. Given a binary relation between fixed strings over the alphabet, called rewrite rules, denoted by , an SRS extends the rewriting relation to all strings in which the left- and right-hand side of the rules appear as substrings, that is , where , , , and are strings.

Conjunctive grammars are a class of formal grammars studied in formal language theory. They extend the basic type of grammars, the context-free grammars, with a conjunction operation. Besides explicit conjunction, conjunctive grammars allow implicit disjunction represented by multiple rules for a single nonterminal symbol, which is the only logical connective expressible in context-free grammars. Conjunction can be used, in particular, to specify intersection of languages. A further extension of conjunctive grammars known as Boolean grammars additionally allows explicit negation.

In automata theory, a deterministic pushdown automaton is a variation of the pushdown automaton. The class of deterministic pushdown automata accepts the deterministic context-free languages, a proper subset of context-free languages.

In formal language theory, an alphabet, sometimes called a vocabulary, is a non-empty set of symbols/glyphs, typically thought of as representing letters, characters, digits, phonemes, or even words. Alphabets in this technical sense of a set are used in a diverse range of fields including logic, mathematics, computer science, and linguistics. An alphabet may have any cardinality ("size") and depending on its purpose maybe be finite, countable, or even uncountable.

A regular language is said to be star-free if it can be described by a regular expression constructed from the letters of the alphabet, the empty set symbol, all boolean operators – including complementation – and concatenation but no Kleene star. For instance, the language of words over the alphabet that do not have consecutive a's can be defined by , where denotes the complement of a subset of . The condition is equivalent to having generalized star height zero.

In computer science, a trace is a set of strings, wherein certain letters in the string are allowed to commute, but others are not. It generalizes the concept of a string, by not forcing the letters to always be in a fixed order, but allowing certain reshufflings to take place. Traces were introduced by Pierre Cartier and Dominique Foata in 1969 to give a combinatorial proof of MacMahon's master theorem. Traces are used in theories of concurrent computation, where commuting letters stand for portions of a job that can execute independently of one another, while non-commuting letters stand for locks, synchronization points or thread joins.

In mathematics and theoretical computer science, a semiautomaton is a deterministic finite automaton having inputs but no output. It consists of a set Q of states, a set Σ called the input alphabet, and a function T: Q × Σ → Q called the transition function.

In computer science, in the area of formal language theory, frequent use is made of a variety of string functions; however, the notation used is different from that used for computer programming, and some commonly used functions in the theoretical realm are rarely used when programming. This article defines some of these basic terms.

In mathematics and computer science, a history monoid is a way of representing the histories of concurrently running computer processes as a collection of strings, each string representing the individual history of a process. The history monoid provides a set of synchronization primitives for providing rendezvous points between a set of independently executing processes or threads.

In computer science, more precisely in automata theory, a rational set of a monoid is an element of the minimal class of subsets of this monoid that contains all finite subsets and is closed under union, product and Kleene star. Rational sets are useful in automata theory, formal languages and algebra.

In theoretical computer science, in particular in formal language theory, the Brzozowski derivative of a set of strings and a string is the set of all strings obtainable from a string in by cutting off the prefix . Formally:

References

  1. Nayuki Minase (10 May 2011). "Countable sets and Kleene star". Project Nayuki. Retrieved 11 January 2012.
  2. Fletcher, Peter; Hoyle, Hughes; Patty, C. Wayne (1991). Foundations of Discrete Mathematics. Brooks/Cole. p. 656. ISBN   0534923739. The Kleene closureL* of L is defined to be .
  3. This equation holds because every element of V+ must either be composed from one element of V and finitely many non-empty terms in V or is just an element of V (where V itself is retrieved by taking V concatenated with ε).
  4. Droste, M.; Kuich, W. (2009). "Chapter 1: Semirings and Formal Power Series". Handbook of Weighted Automata . Monographs in Theoretical Computer Science. Springer. p.  9. doi:10.1007/978-3-642-01492-5_1. ISBN   978-3-642-01491-8.

Further reading