Shortlex order

Last updated

In mathematics, and particularly in the theory of formal languages, shortlex is a total ordering for finite sequences of objects that can themselves be totally ordered. In the shortlex ordering, sequences are primarily sorted by cardinality (length) with the shortest sequences first, and sequences of the same length are sorted into lexicographical order. [1] Shortlex ordering is also called radix, length-lexicographic, military, or genealogical ordering. [2]

In the context of strings on a totally ordered alphabet, the shortlex order is identical to the lexicographical order, except that shorter strings precede longer strings. For example, the shortlex order of the set of strings on the English alphabet (in its usual order) is [ε, a, b, c, ..., z, aa, ab, ac, ..., zz, aaa, aab, aac, ..., zzz, ...], where ε denotes the empty string.

The strings in this ordering over a fixed finite alphabet can be placed into one-to-one order-preserving correspondence with the natural numbers, giving the bijective numeration system for representing numbers. [3] The shortlex ordering is also important in the theory of automatic groups. [4]

See also

Related Research Articles

Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office filing systems, library catalogs, and reference books.

<span class="mw-page-title-main">Formal language</span> Sequence of words formed by specific rules

In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules called a formal grammar.

In theoretical computer science and formal language theory, a regular language is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science.

<span class="mw-page-title-main">String (computer science)</span> Sequence of characters, data type

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed. A string is generally considered as a data type and is often implemented as an array data structure of bytes that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence data types and structures.

In mathematics, a total order or linear order is a partial order in which any two elements are comparable. That is, a total order is a binary relation on some set , which satisfies the following for all and in :

  1. (reflexive).
  2. If and then (transitive).
  3. If and then (antisymmetric).
  4. or .

In mathematics, a real number is said to be simply normal in an integer base b if its infinite sequence of digits is distributed uniformly in the sense that each of the b digit values has the same natural density 1/b. A number is said to be normal in base b if, for every positive integer n, all possible strings n digits long have density bn.

In mathematics, a binary relation R is called well-founded on a class X if every non-empty subset SX has a minimal element with respect to R, that is, an element mS not related by sRm for any sS. In other words, a relation is well founded if

Alphabetical order is a system whereby character strings are placed in order based on the position of the characters in the conventional ordering of an alphabet. It is one of the methods of collation. In mathematics, a lexicographical order is the generalization of the alphabetical order to other data types, such as sequences of numbers or other ordered mathematical objects.

In abstract algebra, the free monoid on a set is the monoid whose elements are all the finite sequences of zero or more elements from that set, with string concatenation as the monoid operation and with the unique sequence of zero elements, often called the empty string and denoted by ε or λ, as the identity element. The free monoid on a set A is usually denoted A. The free semigroup on A is the subsemigroup of A containing all elements except the empty string. It is usually denoted A+.

In mathematics, the lexicographic or lexicographical order is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of a totally ordered set.

In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if

<span class="mw-page-title-main">Antimatroid</span> Mathematical system of orderings or sets

In mathematics, an antimatroid is a formal system that describes processes in which a set is built up by including elements one at a time, and in which an element, once available for inclusion, remains available until it is included. Antimatroids are commonly axiomatized in two equivalent ways, either as a set system modeling the possible states of such a process, or as a formal language modeling the different sequences in which elements may be included. Dilworth (1940) was the first to study antimatroids, using yet another axiomatization based on lattice theory, and they have been frequently rediscovered in other contexts.

In formal language theory, the empty string, or empty word, is the unique string of length zero.

Computable functions are the basic objects of study in computability theory. Computable functions are the formalized analogue of the intuitive notion of algorithms, in the sense that a function is computable if there exists an algorithm that can do the job of the function, i.e. given an input of the function domain it can return the corresponding output. Computable functions are used to discuss computability without referring to any concrete model of computation such as Turing machines or register machines. Any definition, however, must make reference to some specific model of computation but all valid definitions yield the same class of functions. Particular models of computability that give rise to the set of computable functions are the Turing-computable functions and the general recursive functions.

de Bruijn sequence Cycle through all length-k sequences

In combinatorial mathematics, a de Bruijn sequence of order n on a size-k alphabet A is a cyclic sequence in which every possible length-n string on A occurs exactly once as a substring (i.e., as a contiguous subsequence). Such a sequence is denoted by B(k, n) and has length kn, which is also the number of distinct strings of length n on A. Each of these distinct strings, when taken as a substring of B(k, n), must start at a different position, because substrings starting at the same position are not distinct. Therefore, B(k, n) must have at leastkn symbols. And since B(k, n) has exactlykn symbols, de Bruijn sequences are optimally short with respect to the property of containing every string of length n at least once.

Bijective numeration is any numeral system in which every non-negative integer can be represented in exactly one way using a finite string of digits. The name refers to the bijection that exists in this case between the set of non-negative integers and the set of finite strings using a finite set of symbols.

In mathematics, in the areas of combinatorics and computer science, a Lyndon word is a nonempty string that is strictly smaller in lexicographic order than all of its rotations. Lyndon words are named after mathematician Roger Lyndon, who investigated them in 1954, calling them standard lexicographic sequences. Anatoly Shirshov introduced Lyndon words in 1953 calling them regular words. Lyndon words are a special case of Hall words; almost all properties of Lyndon words are shared by Hall words.

In formal language theory, an alphabet, sometimes called a vocabulary, is a non-empty set of indivisible symbols/glyphs, typically thought of as representing letters, characters, digits, phonemes, or even words. Alphabets in this technical sense of a set are used in a diverse range of fields including logic, mathematics, computer science, and linguistics. An alphabet may have any cardinality ("size") and depending on its purpose maybe be finite, countable, or even uncountable.

In mathematics, the Levi-Civita field, named after Tullio Levi-Civita, is a non-Archimedean ordered field; i.e., a system of numbers containing infinite and infinitesimal quantities. Each member can be constructed as a formal series of the form

Cobham's theorem is a theorem in combinatorics on words that has important connections with number theory, notably transcendental numbers, and automata theory. Informally, the theorem gives the condition for the members of a set S of natural numbers written in bases b1 and base b2 to be recognised by finite automata. Specifically, consider bases b1 and b2 such that they are not powers of the same integer. Cobham's theorem states that S written in bases b1 and b2 is recognised by finite automata if and only if S is a finite union of arithmetic progressions. The theorem was proved by Alan Cobham in 1969 and has since given rise to many extensions and generalisations.

References

  1. Sipser, Michael (2012). Introduction to the Theory of Computation (3 ed.). Boston, MA: Cengage Learning. p.  14. ISBN   978-1133187790.
  2. Bárány, Vince (2008), "A hierarchy of automatic ω-words having a decidable MSO theory", RAIRO Theoretical Informatics and Applications, 42 (3): 417–450, doi:10.1051/ita:2008008, MR   2434027 .
  3. Smullyan, R. (1961), "9. Lexicographical ordering; n-adic representation of integers", Theory of Formal Systems, Annals of Mathematics Studies, vol. 47, Princeton University Press, pp. 34–36.
  4. Epstein, David B. A.; Cannon, James W.; Holt, Derek F.; Levy, Silvio V. F.; Paterson, Michael S.; Thurston, William P. (1992), Word processing in groups, Boston, MA: Jones and Bartlett Publishers, p. 56, ISBN   0-86720-244-0, MR   1161694 .