Macron below

Last updated
̱
Macron below
U+0331̱COMBINING MACRON BELOW
See also
Macron (diacritic)
A̱a̱ḆḇC̱c̱

Macron below is a combining diacritical mark that is used in various orthographies. [1]

Contents

A non-combining form is U+02CDˍMODIFIER LETTER LOW MACRON. It is not to be confused with U+0320̠ COMBINING MINUS SIGN BELOW, U+0332̲ COMBINING LOW LINE and U+005F_ LOW LINE. The difference between "macron below" and "low line" is that the latter results in an unbroken underline when it is run together: compare a̱ḇc̱ and a̲b̲c̲ (only the latter should look like abc). [2]


Unicode

Macron below character

Unicode defines several characters for the macron below:

macron below
combiningspacing
characterUnicodeHTMLcharacterUnicodeHTML
◌̱
single
U+0331̱ˍ
letter
U+02CDˍ
◌͟◌
double
U+035F͟

There are many similar marks covered elsewhere:

Precomposed characters

Various precomposed letters with a macron below are defined in Unicode:

upper caselower casenotes
letterUnicodeHTMLletterUnicodeHTML
U+1E06ḆU+1E07ḇUsed in the transliteration of Biblical Hebrew into the Roman alphabet to show the fricative value of the letter beth (ב) representing [v], or perhaps [ β ].
U+1E0EḎU+1E0FḏUsed in the transliteration of Biblical Hebrew, Syriac and Arabic into the Roman alphabet to show the fricative value of the letter dalet (ד), [ ð ], and in the romanization of Pashto, it is used sometimes to represent retroflex D. In Dravidian languages' transcription it represents an alveolar /d/.
U+1E96ẖSometimes used for Arabic خ ẖāʼ, Hebrew Heth (letter), Egyptian 𓄡 .

There is no precomposed upper case equivalent of so it uses a combining macron below instead: .

U+1E34ḴU+1E35ḵUsed in the transliteration of Biblical Hebrew into the Roman alphabet to show the fricative value of the letter kaph (כ) representing [ x ].

Used in Tlingit and Haida (among other Pacific Northwest languages) for the voiceless uvular stop [ q ]. Close to Korean ㄲ kk; closest English "shocking" Used optionally in the K-dialect of Māori in the South Island of New Zealand, where an original ng has merged with k. The ḵ indicates that it corresponds to ng in other dialects. There is no difference in pronunciation between ḵ and k.

U+1E3AḺU+1E3BḻOne possible transliteration of the Dravidian retroflex approximant /ɻ/ as in Tamil letter . Ḻ is used in the Seri language to represent [ l ], like English l, while unmodified "l" represents [ ɬ ], like Welsh ll. It is also used in the proposed Unified Alphabet for Mapudungun.
U+1E48ṈU+1E49ṉUsed in Pitjantjatjara to represent [ ɳ ], and in Saanich to represent both plain and glottalized [ ɴ ]. In the romanization of Pashto, it is used sometimes to represent retroflex N. In Dravidian languages' transcription it represents an alveolar /n/.
U+1E5EṞU+1E5FṟUsed in Pitjantjatjara to represent [ ɻ ], and sometimes in the romanization of Pashto to represent the retroflex R. In Dravidian languages' transcription it represents an alveolar trill /r/.
U+1E6EṮU+1E6FṯUsed in the proposed Unified Alphabet for Mapudungun language representing [ ]. In the romanization of Pashto, it is used sometimes to represent retroflex T. In Dravidian languages' transcription it represents an alveolar /t/. In the romanization of Arabic this letter is used to transcribe the letter Ṯāʾ.
U+1E94ẔU+1E95ẕUsed in the 1953 Hebrew Academy Romanization of Hebrew to represent tsade (צ).
U+20AB₫ Vietnamese đồng.

Note that the Unicode character names of precomposed characters whose decompositions contain U+0331̱COMBINING MACRON BELOW use "WITH LINE BELOW" rather than "WITH MACRON BELOW". Thus, U+1E07LATIN SMALL LETTER B WITH LINE BELOW decomposes to U+0062bLATIN SMALL LETTER B and U+0331̱COMBINING MACRON BELOW. [3]

The Vietnamese đồng currency sign resembles a lower case d with a stroke and macron below: U+20ABDONG SIGN but is neither a letter nor decomposable. [4]

See also

Related Research Articles

A macron is a diacritical mark: it is a straight bar ¯ placed above a letter, usually a vowel. Its name derives from Ancient Greek μακρόν (makrón) 'long' because it was originally used to mark long or heavy syllables in Greco-Roman metrics. It now more often marks a long vowel. In the International Phonetic Alphabet, the macron is used to indicate a mid-tone; the sign for a long vowel is instead a modified triangular colon ː.

The Coptic script is the script used for writing the Coptic language, the latest stage of Egyptian. The repertoire of glyphs is based on the uncial Greek alphabet, augmented by letters borrowed from the Egyptian Demotic. It was the first alphabetic script used for the Egyptian language. There are several Coptic alphabets, as the script varies greatly among the various dialects and eras of the Coptic language.

<span class="mw-page-title-main">Breve</span> Diacritical mark

A breve is the diacritic mark ◌̆, shaped like the bottom half of a circle. As used in Ancient Greek, it is also called brachy, βραχύ. It resembles the caron but is rounded, in contrast to the angular tip of the caron. In many forms of Latin, ◌̆ is used for a shorter, softer variant of a vowel, such as "Ĭ", where the sound is nearly identical to the English /i/.

An underscore or underline is a line drawn under a segment of text. In proofreading, underscoring is a convention that says "set this text in italic type", traditionally used on manuscript or typescript as an instruction to the printer. Its use to add emphasis in modern finished documents is generally avoided.

In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks.

A precomposed character is a Unicode entity that can also be defined as a sequence of one or more other characters. A precomposed character may typically represent a letter with a diacritical mark, such as é. Technically, é (U+00E9) is a character that can be decomposed into an equivalent string of the base letter e (U+0065) and combining acute accent (U+0301). Similarly, ligatures are precompositions of their constituent letters or graphemes.

Unicode has subscripted and superscripted versions of a number of characters including a full set of Arabic numerals. These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.

As of Unicode version 15.1, Cyrillic script is encoded across several blocks:

An overline, overscore, or overbar, is a typographical feature of a horizontal line drawn immediately above the text. In old mathematical notation, an overline was called a vinculum, a notation for grouping symbols which is expressed in modern notation by parentheses, though it persists for symbols under a radical sign. The original use in Ancient Greek was to indicate compositions of Greek letters as Greek numerals. In Latin, it indicates Roman numerals multiplied by a thousand and it forms medieval abbreviations (sigla). Marking one or more words with a continuous line above the characters is sometimes called overstriking, though overstriking generally refers to printing one character on top of an already-printed character.

Unicode supports several phonetic scripts and notations through its existing scripts and the addition of extra blocks with phonetic characters. These phonetic characters are derived from an existing script, usually Latin, Greek or Cyrillic. Apart from the International Phonetic Alphabet (IPA), extensions to the IPA and obsolete and nonstandard IPA symbols, these blocks also contain characters from the Uralic Phonetic Alphabet and the Americanist Phonetic Alphabet.

Unicode equivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting standard character sets, which often included similar or identical characters.

ISO 11940 is an ISO standard for the transliteration of Thai characters, published in 1998 and updated in September 2003 and confirmed in 2008. An extension to this standard named ISO 11940-2 defines a simplified transcription based on it.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

In Unicode and the UCS, a compatibility character is a character that is encoded solely to maintain round-trip convertibility with other, often older, standards. As the Unicode Glossary says:

A character that would not have been encoded except for compatibility and round-trip convertibility with other standards

In Unicode, a script is a collection of letters and other written signs used to represent textual information in one or more writing systems. Some scripts support one and only one writing system and language, for example, Armenian. Other scripts support many different writing systems; for example, the Latin script supports English, French, German, Italian, Vietnamese, Latin itself, and several other languages. Some languages make use of multiple alternate writing systems and thus also use several scripts; for example, in Turkish, the Arabic script was used before the 20th century but transitioned to Latin in the early part of the 20th century. More or less complementary to scripts are symbols and Unicode control characters.

The programming language APL uses a number of symbols, rather than words from natural language, to identify operations, similarly to mathematical symbols. Prior to the wide adoption of Unicode, a number of special-purpose EBCDIC and non-EBCDIC code pages were used to represent the symbols required for writing APL.

The tie is a symbol in the shape of an arc similar to a large breve, used in Greek, phonetic alphabets, and Z notation. It can be used between two characters with spacing as punctuation, non-spacing as a diacritic, or (underneath) as a proofreading mark. It can be above or below, and reversed. Its forms are called tie, double breve, enotikon or papyrological hyphen, ligature tie, and undertie.

A typographic approximation is a replacement of an element of the writing system with another glyph or glyphs. The replacement may be a nearly homographic character, a digraph, or a character string. An approximation is different from a typographical error in that an approximation is intentional and aims to preserve the visual appearance of the original. The concept of approximation also applies to the World Wide Web and other forms of textual information available via digital media, though usually at the level of characters, not glyphs.

<span class="mw-page-title-main">Greek and Coptic</span> Unicode character block

Greek and Coptic is the Unicode block for representing modern (monotonic) Greek. It was originally also used for writing Coptic, using the similar Greek letters in addition to the uniquely Coptic additions. Beginning with version 4.1 of the Unicode Standard, a separate Coptic block has been included in Unicode, allowing for mixed Greek/Coptic text that is stylistically contrastive, as is convention in scholarly works. Writing polytonic Greek requires the use of combining characters or the precomposed vowel + tone characters in the Greek Extended character block.

References

  1. 1 2 "Combining Diacritical Marks Code Chart, Range: 0300–036F" (PDF). The Unicode Standard. Retrieved 2016-11-21.
  2. "6.2 General Punctuation" (PDF). The Unicode Standard. Version 11.0.0. Mountain View, CA: The Unicode Consortium. 2018. p. 273. ISBN   978-1-936213-19-1 . Retrieved 2018-12-12. Spacing Overscores and Underscores. U+203E OVERLINE is the above-the-line counterpart to U+005F low line. It is a spacing character, not to be confused with U+0305 COMBINING OVERLINE. As with all overscores and underscores, a sequence of these characters should connect in an unbroken line. The overscoring characters also must be distinguished from U+0304 COMBINING MACRON, which does not connect horizontally in this way.
  3. "Latin Extended Additional Code Chart, Range: 1E00–1EFF" (PDF). The Unicode Standard. Retrieved 2016-11-21.
  4. "Unicode character database". The Unicode Standard. Retrieved 2016-11-21.