The Caverphone within linguistics and computing, is a phonetic matching algorithm [1] [2] invented to identify English names with their sounds, originally built to process a custom dataset compound between 1893 and 1938 in southern Dunedin, New Zealand. [3] Started from a similar concept as metaphone, it has been developed to accommodate and process general English since then. [3]
The Caverphone was created by David Hood in the Caversham Project at the University of Otago in New Zealand in 2002, revised in 2004. It was created to assist in data matching between late 19th century and early 20th century electoral rolls, where the name only needed to be in a "commonly recognisable form". The algorithm was intended to apply to those names that could not easily be matched between electoral rolls, after the exact matches were removed from the pool of potential matches. The algorithm is optimised for accents present in the study area (southern part of the city of Dunedin, New Zealand).
The rules of the algorithm are applied consecutively to any particular name, as a series of replacements.
The algorithm is as follows:
Lee -> lee lee -> l33 l33 -> L33 L33 -> L L -> L111111 L111111 -> L11111
Thompson -> thompson thompson -> th3mps3n th3mps3n -> th3mpS3n th3mpS3n -> Th3mpS3n Th3mpS3n -> Th3mPS3n Th3mPS3n -> Th3MPS3n Th3MPS3n -> Th3MPS3N Th3MPS3N -> T23MPS3N T23MPS3N -> TMPSN TMPSN111111 -> TMPSN1
Lee -> lee lee -> le le -> l3 l3 -> L3 L3 -> LA LA -> LA1111111111 LA1111111111 -> LA11111111
Thompson -> thompson thompson -> th3mps3n th3mps3n -> th3mpS3n th3mpS3n -> Th3mpS3n Th3mpS3n -> Th3mPS3n Th3mPS3n -> Th3MPS3n Th3MPS3n -> Th3MPS3N Th3MPS3N -> T23MPS3N T23MPS3N -> TMPSN TMPSN1111111111 -> TMPSN11111
Y, or y, is the 25th and penultimate letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. According to some authorities, it is the sixth vowel letter of the English alphabet. In the English writing system, it mostly represents a vowel and seldom a consonant, and in other orthographies it may represent a vowel or a consonant. Its name in English is wye, plural wyes.
Metaphone is a phonetic algorithm, published by Lawrence Philips in 1990, for indexing words by their English pronunciation. It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding, which does a better job of matching words and names which sound similar. As with Soundex, similar-sounding words should share the same keys. Metaphone is available as a built-in operator in a number of systems.
Revised Romanization of Korean is the official Korean language romanization system in South Korea. It was developed by the National Academy of the Korean Language from 1995 and was released to the public on 7 July 2000 by South Korea's Ministry of Culture and Tourism in Proclamation No. 2000-8.
Jyutping is a romanisation system for Cantonese developed by the Linguistic Society of Hong Kong (LSHK), an academic group, in 1993. Its formal name is the Linguistic Society of Hong Kong Cantonese Romanization Scheme. The LSHK advocates for and promotes the use of this romanisation system.
In mathematics, and more specifically in computer algebra, computational algebraic geometry, and computational commutative algebra, a Gröbner basis is a particular kind of generating set of an ideal in a polynomial ring K[x1, ..., xn] over a field K. A Gröbner basis allows many important properties of the ideal and the associated algebraic variety to be deduced easily, such as the dimension and the number of zeros when it is finite. Gröbner basis computation is one of the main practical tools for solving systems of polynomial equations and computing the images of algebraic varieties under projections or rational maps.
The alphabet for Modern English is a Latin-script alphabet consisting of 26 letters, each having an upper- and lower-case form. The word alphabet is a compound of the first two letters of the Greek alphabet, alpha and beta. The alphabet originated around the 7th century to write Old English from Latin script. Since then, letters have been added or removed to give the current letters:
Nauruan or Nauru is an Austronesian language, spoken natively in the island country of Nauru. Its relationship to the other Micronesian languages is not well understood.
IJ is a digraph of the letters i and j. Occurring in the Dutch language, it is sometimes considered a ligature, or a letter in itself. In most fonts that have a separate character for ij, the two composing parts are not connected but are separate glyphs, which are sometimes slightly kerned.
The Azerbaijani alphabet has three versions which includes the Arabic, Latin, and Cyrillic alphabets.
Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. The algorithm mainly encodes consonants; a vowel will not be encoded unless it is the first letter. Soundex is the most widely known of all phonetic algorithms Improvements to Soundex are the basis for many modern phonetic algorithms.
The Royal Thai General System of Transcription (RTGS) is the official system for rendering Thai words in the Latin alphabet. It was published by the Royal Institute of Thailand in early 1917, when Thailand was called Siam.
Canadian syllabic writing, or simply syllabics, is a family of writing systems used in a number of Indigenous Canadian languages of the Algonquian, Inuit, and (formerly) Athabaskan language families. These languages had no formal writing system previously. They are valued for their distinctiveness from the Latin script and for the ease with which literacy can be achieved. For instance, by the late 19th century the Cree had achieved what may have been one of the highest rates of literacy in the world.
Irish orthography is the set of conventions used to write Irish. A spelling reform in the mid-20th century led to An Caighdeán Oifigiúil, the modern standard written form used by the Government of Ireland, which regulates both spelling and grammar. The reform removed inter-dialectal silent letters, simplified some letter sequences, and modernised archaic spellings to reflect modern pronunciation, but it also removed letters pronounced in some dialects but not in others.
The New York State Identification and Intelligence System Phonetic Code, commonly known as NYSIIS, is a phonetic algorithm devised in 1970 as part of the New York State Identification and Intelligence System. It features an accuracy increase of 2.7% over the traditional Soundex algorithm.
A division algorithm is an algorithm which, given two integers N and D, computes their quotient and/or remainder, the result of Euclidean division. Some are applied by hand, while others are employed by digital circuit designs and software.
There are various systems of romanization of the Armenian alphabet.
Caversham is one of the older suburbs of the city of Dunedin, in New Zealand's South Island. It is sited at the western edge of the city's central plain at the mouth of the steep Caversham Valley, which rises to the saddle of Lookout Point. Major road and rail routes south lie nearby; the South Island Main Trunk railway runs through the suburb, and a bypass skirts its main retail area, connecting Dunedin's one-way street system with the Dunedin Southern Motorway. The suburb is linked by several bus routes to its neighbouring suburbs and central Dunedin.
The Korean alphabet, known as Hangul in South Korea and Chosŏn'gŭl (조선글) in North Korea, is the modern official writing system for the Korean language. The letters for the five basic consonants reflect the shape of the speech organs used to pronounce them, and they are systematically modified to indicate phonetic features; similarly, the vowel letters are systematically modified for related sounds, making Hangul a featural writing system. It has been described as a syllabic alphabet as it combines the features of alphabetic and syllabic writing systems, although it is not necessarily an abugida.
In computer science theory – particularly formal language theory – Glushkov's construction algorithm, invented by Victor Mikhailovich Glushkov, transforms a given regular expression into an equivalent nondeterministic finite automaton (NFA). Thus, it forms a bridge between regular expressions and nondeterministic finite automata: two abstract representations of the same class of formal languages.
Cologne phonetics is a phonetic algorithm which assigns to words a sequence of digits, the phonetic code. The aim of this procedure is that identical sounding words have the same code assigned to them. The algorithm can be used to perform a similarity search between words. For example, it is possible in a name list to find entries like "Meier" under different spellings such as "Maier", "Mayer", or "Mayr". The Cologne phonetics is related to the well known Soundex phonetic algorithm but is optimized to match the German language. The algorithm was published in 1969 by Hans Joachim Postel.