Vietnamese language and computers

Last updated

The Vietnamese language is written with a Latin script with diacritics (accent tones) which requires several accommodations when typing on phone or computers. Software-based systems are a form of writing Vietnamese on phones or computers with software that can be installed on the device or from third-party software such as UniKey. Telex is the oldest input method devised to encode the Vietnamese language with its tones. Other input methods may also include VNI (Number key-based keyboard) and VIQR. VNI input method is not to be confused with VNI code page.

Contents

Historically, Vietnamese was also written in chữ Nôm , which is mainly used for ceremonial and traditional purposes in recent times, and remains in the field of historians and philologists. There have been attempts to type chữ Hán and chữ Nôm with existing Vietnamese input methods, but they are not widespread. [1] [2] Sometimes, Vietnamese can be typed without tone marks, which Vietnamese speakers can usually guess depending on context.

Fonts and character encodings

Vietnamese alphabet

Hook (diacritic).svg
Hook (diacritic) in Calibri.png
It is common for two diacritics to be placed on a single Vietnamese vowel. Some fonts stack these diacritics, while others offset the tone mark.

Character encodings

There are as many as 46 character encodings for representing the Vietnamese alphabet. [3] Unicode has become the most popular form for many of the world's writing systems, due to its great compatibility and software support. Diacritics may be encoded either as combining characters or as precomposed characters, which are scattered throughout the Latin-1 Supplement, Latin Extended-A, Latin Extended-B, and Latin Extended Additional blocks. The Vietnamese đồng symbol is encoded in the Currency Symbols block.

Unicode's coverage of Vietnamese has been subject to several changes since the 1990s. Early versions of Unicode encoded dấu huyền and dấu sắc as U+0340̀COMBINING GRAVE TONE MARK and U+0341́COMBINING ACUTE TONE MARK, respectively. In 2001, these two characters were deprecated as duplicate encodings of U+0300̀COMBINING GRAVE ACCENT and U+0301́COMBINING ACUTE ACCENT; [4] this change was incorporated into Unicode 3.2, released in 2002. [5] With the 2009 release of Unicode 5.2, U+0340̀ and U+0341́ were undeprecated but discouraged. [6] [7] Historically, the Vietnamese language used other characters beyond the modern alphabet. The Middle Vietnamese letter B with flourish (ꞗ) is included in the Latin Extended-D block. The apex is not separately encoded in Unicode, because it derives from the Portuguese tilde, whereas dấu ngã, which derives from the Greek perispomeni , has always been misencoded as a tilde. As a workaround, U+1DC4COMBINING MACRON-ACUTE represents the apex on Wikisource and Wiktionary.

For systems that lack support for Unicode, dozens of 8-bit Vietnamese code pages have been designed. [3] The most commonly used of them were VISCII, VSCII (TCVN 5712:1993), VNI, VPS and Windows-1258. [8] [9] Where ASCII is required, such as when ensuring readability in plain text e-mail, Vietnamese letters are often encoded according to Vietnamese Quoted-Readable (VIQR) or VSCII Mnemonic (VSCII-MNEM), [10] though usage of either variable-width scheme has declined dramatically following the adoption of Unicode on the World Wide Web. For instance, support for all above mentioned 8-bit encodings, with the exception of Windows-1258, was dropped from Mozilla software in 2014. [11]

Many Vietnamese fonts intended for desktop publishing are encoded in VNI or TCVN3 (VSCII). [9] Such fonts are known as "ABC fonts". [12] Popular web browsers lack support for specialty Vietnamese encodings, so any webpage that uses these fonts appears as unintelligible mojibake on systems without them installed.

At right, an i that retains its tittle I acute - soft dotted and Lithuanian dot.svg
At right, an í that retains its tittle

Vietnamese often stacks diacritics, so typeface designers must take care to prevent stacked diacritics from colliding with adjacent letters or lines. When a tone mark is used together with another diacritic, offsetting the tone mark to the right preserves consistency and avoids slowing down saccades. [13] In advertising signage and in cursive handwriting, diacritics often take forms unfamiliar to other Latin alphabets. For example, the lowercase letter I retains its tittle in ì, , ĩ, and í. [14] These nuances are rarely accounted for in computing environments.

Approaches to character encoding

Vietnamese writing requires 134 additional letters (between both cases) besides the 52 already present in ASCII. [15] This exceeds the 128 additional characters available in a conventional extended ASCII encoding. Although this can be solved by using a variable-width encoding (as is done by UTF-8), a number of approaches have been used by other encodings to support Vietnamese without doing so:

  • Replace at least six ASCII characters, selected either for being uncommon in Vietnamese, and/or for being non-invariant in ISO 646 or DEC NRCS [15] (as in VNI for DOS).
  • Drop the uppercase letters which are least frequently used, [15] or all uppercase letters with tone marks (as in VSCII-3 (TCVN3)). These letters may still be supplied by means of all-capital fonts. [16]
  • Drop forms of the letter Y with tone marks, necessitating use of the letter I in those circumstances. This approach was rejected by the designers of VISCII on the basis that a character encoding should not attempt to settle a spelling reform issue. [15]
  • Replace at least six C0 control characters [15] (as in VISCII, VSCII-1 (TCVN1) and VPS).
  • Use combining characters, allowing one vowel with accents to be fully represented using a sequence of characters (as in VNI, VSCII-2 (TCVN2), Windows-1258 and ANSEL).

Unicode code points

The following table provides Unicode code points for all non-ASCII Vietnamese letters.

UnmarkedGraveHookTildeAcuteDot
̀ (U+0300)̉ (U+0309)̃ (U+0303)́ (U+0301)̣ (U+0323)
Uppercase letters
AÀ (U+00C0)Ả (U+1EA2)Ã (U+00C3)Á (U+00C1)Ạ (U+1EA0)
Ă (U+0102)Ằ (U+1EB0)Ẳ (U+1EB2)Ẵ (U+1EB4)Ắ (U+1EAE)Ặ (U+1EB6)
 (U+00C2)Ầ (U+1EA6)Ẩ (U+1EA8)Ẫ (U+1EAA)Ấ (U+1EA4)Ậ (U+1EAC)
Đ (U+0110)
EÈ (U+00C8)Ẻ (U+1EBA)Ẽ (U+1EBC)É (U+00C9)Ẹ (U+1EB8)
Ê (U+00CA)Ề (U+1EC0)Ể (U+1EC2)Ễ (U+1EC4)Ế (U+1EBE)Ệ (U+1EC6)
IÌ (U+00CC)Ỉ (U+1EC8)Ĩ (U+0128)Í (U+00CD)Ị (U+1ECA)
OÒ (U+00D2)Ỏ (U+1ECE)Õ (U+00D5)Ó (U+00D3)Ọ (U+1ECC)
Ô (U+00D4)Ồ (U+1ED2)Ổ (U+1ED4)Ỗ (U+1ED6)Ố (U+1ED0)Ộ (U+1ED8)
Ơ (U+01A0)Ờ (U+1EDC)Ở (U+1EDE)Ỡ (U+1EE0)Ớ (U+1EDA)Ợ (U+1EE2)
UÙ (U+00D9)Ủ (U+1EE6)Ũ (U+0168)Ú (U+00DA)Ụ (U+1EE4)
Ư (U+01AF)Ừ (U+1EEA)Ử (U+1EEC)Ữ (U+1EEE)Ứ (U+1EE8)Ự (U+1EF0)
YỲ (U+1EF2)Ỷ (U+1EF6)Ỹ (U+1EF8)Ý (U+00DD)Ỵ (U+1EF4)
Lowercase letters
aà (U+00E0)ả (U+1EA3)ã (U+00E3)á (U+00E1)ạ (U+1EA1)
ă (U+0103)ằ (U+1EB1)ẳ (U+1EB3)ẵ (U+1EB5)ắ (U+1EAF)ặ (U+1EB7)
â (U+00E2)ầ (U+1EA7)ẩ (U+1EA9)ẫ (U+1EAB)ấ (U+1EA5)ậ (U+1EAD)
đ (U+0111)
eè (U+00E8)ẻ (U+1EBB)ẽ (U+1EBD)é (U+00E9)ẹ (U+1EB9)
ê (U+00EA)ề (U+1EC1)ể (U+1EC3)ễ (U+1EC5)ế (U+1EBF)ệ (U+1EC7)
iì (U+00EC)ỉ (U+1EC9)ĩ (U+0129)í (U+00ED)ị (U+1ECB)
oò (U+00F2)ỏ (U+1ECF)õ (U+00F5)ó (U+00F3)ọ (U+1ECD)
ô (U+00F4)ồ (U+1ED3)ổ (U+1ED5)ỗ (U+1ED7)ố (U+1ED1)ộ (U+1ED9)
ơ (U+01A1)ờ (U+1EDD)ở (U+1EDF)ỡ (U+1EE1)ớ (U+1EDB)ợ (U+1EE3)
uù (U+00F9)ủ (U+1EE7)ũ (U+0169)ú (U+00FA)ụ (U+1EE5)
ư (U+01B0)ừ (U+1EEB)ử (U+1EED)ữ (U+1EEF)ứ (U+1EE9)ự (U+1EF1)
yỳ (U+1EF3)ỷ (U+1EF7)ỹ (U+1EF9)ý (U+00FD)ỵ (U+1EF5)

Font substitution

Many fonts support a subset of the Latin writing system that omits much of the Vietnamese alphabet. Due to the high density of Vietnamese-specific characters in Vietnamese text, Web browsers that implement font substitution reliably produce a ransom note effect when the webpage specifies an inadequate font.

Chữ Nôm

The nom
character for pho Pho.png
The nôm character for phở

Unicode includes over 10,000 Nôm characters as part of Unicode's repertoire of CJK Unified Ideographs. Of these characters, 10,082 can be found in the CJK Unified Ideographs Extension B block, while the rest are distributed between the CJK Unified Ideographs, CJK Unified Ideographs Extension A, and CJK Unified Ideographs Extension C blocks. A further 1,028 characters, including over 400 characters specific to the Tày language, are encoded in the CJK Unified Ideographs Extension E block. The characters are taken from the Vietnamese standards TCVN 5773:1993 and TCVN 6909:2001 [error for TCVN 6056:1995?], as well as from research by the Han-Nom Research Institute and other groups. [18] All the characters in TCVN 5773:1993 and about 95% of the characters in TCVN 6909:2001 [error for TCVN 6056:1995?] have corresponding codepoints in Unicode 5.1, though TCVN 5773:1993 itself mapped most of its characters to the Private Use Area of Unicode. [19] Unicode 13.0 added two diacritical characters to the Ideographic Symbols and Punctuation block that were commonly used to indicate borrowed characters in chữ Nôm. [20] [21]

The two most comprehensive Nôm fonts are the Vietnamese Nôm Preservation Foundation's Nôm Na Tống Light [22] and the community-developed HAN NOM A/HAN NOM B, [23] both of which place a large number of unstandardized characters in the Private Use Areas.

The Unicode Consortium's Unihan database includes Vietnamese readings of some characters but does not distinguish between Sino-Vietnamese and Nôm readings.

Like other CJKV writing systems, chữ Nôm is traditionally written vertically, from top to bottom and right to left.

Chữ Hán and chữ Nôm may also be annotated using ruby characters, which is the same as chữ Quốc Ngữ for Vietnamese. [24]

Text input

Typewriter Olympia Splendid 33, ADERTY layout (based on AZERTY), used in Vietnam in the 1960s, seen at Museum of Ho Chi Minh City Typewriter-aderty-vn.jpg
Typewriter Olympia Splendid 33, AĐERTY layout (based on AZERTY), used in Vietnam in the 1960s, seen at Museum of Ho Chi Minh City

A purely physical Vietnamese keyboard would be impractical, due to the sheer number of letter-diacritic-diacritic combinations in the alphabet e.g. ờ, ị. Instead, Vietnamese input relies on formulaic software-based keyboard layouts, virtual keyboards, or input methods (also known as IMEs).

Keyboard layouts

Vietnamese keyboard.svg
Microsoft Windows includes a Vietnamese keyboard layout based on TCVN 6064:1995.
Vietnamese typewriter keyboard.svg
AZERTY-based Vietnamese typewriter keyboard layout

Vietnamese keyboard layouts rely on dead keys to compose letters with diacritics. Most desktop operating systems include a Vietnamese keyboard layout similar to TCVN 6064:1995  [ vi ], a Vietnamese national standard. Previously, typewriters used an AZERTY-based Vietnamese layout (AĐERTY). [25]

Input methods

xvnkb, an IME compatible with the X Input Method framework on Unix systems, supports output in six character encodings. Xvnkb-screenshot.png
xvnkb, an IME compatible with the X Input Method framework on Unix systems, supports output in six character encodings.

The three most common Vietnamese input methods are Telex, VNI, and VIQR. Telex indicates diacritics using letters that are unlikely to appear at the end of a word, while VNI repurposes the number keys or function keys and VIQR repurposes various punctuation marks. The Telex and VIQR conventions originated in an earlier era of telex machines and typewriters, respectively.

Support for these input methods is provided by input method editors (IMEs), which are known in Vietnamese as bộ gõ, literally "peckers", "typing sets" or "percussion" in more general terms. IMEs may be provided by the operating system, installed as a third-party application, installed as a browser extension, or provided by an individual website in the form of a script. Common third-party applications include GoTiengViet, UniKey, VietKey, VPSKeys, WinVNKey, and xvnkb. On Unix-like operating systems, the IBus and SCIM frameworks both support Vietnamese. IME scripts such as AVIM, Mudim, and VietTyping can be found on most Vietnamese message boards, the Vietnamese Wikipedia, and other text-intensive websites. The Vietnamese Web browser Cốc Cốc comes with an input method built-in.

Input methods allow words to be composed in a more flexible order than keyboard layouts allow. For example, to enter the word " viết " using the TCVN 6064:1995 keyboard layout, one must type VI38T, in that order. By contrast, most IMEs permit the user to insert diacritics at the end of the word: VIEETS in Telex, VIET61 in VNI, or VIET^' in VIQR. Some IMEs even allow diacritics to be entered before their base letters. Depending on an IME's implementation, it may also be possible to edit an existing word's diacritics without retyping the word.

Some virtual keyboards supplement the standard dead keys with dedicated shortcut keys. For example, with the VIQR keyboard built into iOS, it is possible to add a horn to "U" by tapping either 123#+=+ or the dedicated ◌̛ key, which has no analogue on a physical keyboard.

When Vietnamese input methods are unavailable, Vietnamese text is commonly printed without diacritical marks and then handwritten on. Lee's Supermarket help wanted sign.jpg
When Vietnamese input methods are unavailable, Vietnamese text is commonly printed without diacritical marks and then handwritten on.

Borrowing a feature common amongst Chinese input methods, some Vietnamese IMEs allow one to skip diacritics altogether and instead, after typing the base letters, the user can select the accented word from a candidate list. In order to provide this autocomplete list, the IME may need to communicate with a Web service. Some IMEs also use candidate lists to allow the user to convert text from the Vietnamese alphabet to chữ Nôm, because there is no one-to-one correspondence between alphabetic words and nôm characters.

Other considerations

Typical Vietnamese text contains a high proportion of compound words. Compound words are never hyphenated in contemporary usage, so spell checkers are limited to checking individual syllables unless a statistical language model is consulted.

Vietnamese has rigid spelling rules and few exceptions, so text-to-speech engines may avoid dictionary lookups except when encountering a foreign loan word. TTS engines must account for tones, which are essential to the meaning of any Vietnamese word e.g. má (mother) is a different word to mà (but).

Internationalized user interfaces are generally unable to use the full complement of Vietnamese pronouns that would be expected in a traditional social setting, even when much is known about the user. Instead, user interfaces typically use generic pronouns such as tôi and bạn, some of which make potentially incorrect assumptions about the user's age and relationship to other users. For example, when a social media platform notifies a user about a younger user, it may refer to the latter in the third person as anh ấy instead of em ấy, leading the user to misinterpret the notification as a reference to someone else. [26]

See also

Related Research Articles

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 15.1 of the standard defines 149813 characters and 161 scripts used in various ordinary, literary, academic, and technical contexts.

<span class="mw-page-title-main">CJK characters</span> Logographs in shared East Asian written tradition

In internationalization, CJK characters is a collective term for graphemes used in the Chinese, Japanese, and Korean writing systems, which each include Chinese characters. The term CJKV also includes Chữ Nôm, the Chinese-origin logographic script formerly used for the Vietnamese language.

Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature shared in common by written Chinese (hanzi), Japanese (kanji), Korean (hanja) and Vietnamese.

<span class="mw-page-title-main">Input method</span> Method for generating non-native characters on devices

An input method is an operating system component or program that enables users to generate characters not natively available on their input devices by using sequences of characters that are available to them. Using an input method is usually necessary for languages that have more graphemes than there are keys on the keyboard.

The Vietnamese alphabet is the modern writing script for Vietnamese. It uses the Latin script based on Romance languages originally developed by Francisco de Pina (1585–1625), a missionary from Portugal.

<span class="mw-page-title-main">GB 18030</span> Official Chinese character encoding

GB 18030 is a Chinese government standard, described as Information Technology — Chinese coded character set and defines the required language and character support necessary for software in China. GB18030 is the registered Internet name for the official character set of the People's Republic of China (PRC) superseding GB2312. As a Unicode Transformation Format, GB18030 supports both simplified and traditional Chinese characters. It is also compatible with legacy encodings including GB/T 2312, CP936, and GBK 1.0.

VISCII is an unofficially-defined modified ASCII character encoding for using the Vietnamese language with computers. It should not be confused with the similarly-named officially registered VSCII encoding. VISCII keeps the 95 printable characters of ASCII unmodified, but it replaces 6 of the 33 control characters with printable characters. It adds 128 precomposed characters. Unicode and the Windows-1258 code page are now used for virtually all Vietnamese computer data, but legacy VSCII and VISCII files may need conversion.

Windows-1258 is a code page used in Microsoft Windows to represent Vietnamese texts. It makes use of combining diacritical marks.

Vietnamese Quoted-Readable, also known as Vietnet, is a convention for writing Vietnamese using ASCII characters encoded in only 7 bits, making possible for Vietnamese to be supported in computing and communication systems at the time. Because the Vietnamese alphabet contains a complex system of diacritical marks, VIQR requires the user to type in a base letter, followed by one or two characters that represent the diacritical marks.

<i>Mojikyō</i> Character encoding scheme

Mojikyō, also known by its full name Konjaku Mojikyō, is a character encoding scheme created to provide a complete index of characters used in the Chinese, Japanese, Korean, Vietnamese Chữ Nôm and other historical Chinese logographic writing systems. The Mojikyō Institute, which published the character set, also published computer software and TrueType fonts to accompany it. The Mojikyō Institute, chaired by Tadahisa Ishikawa (石川忠久), originally had its character set and related software and data redistributed on CD-ROMs sold in Kinokuniya stores.

<span class="mw-page-title-main">D with stroke</span> Variant of the letter D, used in Sámi alphabets, Serbo-Croatian Latin alphabet, and Vietnamese

Đ, known as crossed D or dyet, is a letter formed from the base character D/d overlaid with a crossbar. Crossing was used to create eth (ð), but eth has an uncial as its base whereas đ is based on the straight-backed roman d, like in the Sámi languages and Vietnamese. Crossed d is a letter in the alphabets of several languages and is used in linguistics as a voiced dental fricative.

A Unicode font is a computer font that maps glyphs to code points defined in the Unicode Standard. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single writing system, or even only support the basic Latin alphabet. Fonts which support a wide range of Unicode scripts and Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a TrueType font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters. This article lists some widely used Unicode fonts that support a comparatively large number and broad range of Unicode characters.

The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode 15.1, Unicode defines a total of 97,680 characters.

Telex or TELEX, is a convention for encoding Vietnamese text in plain ASCII characters. Originally used for transmitting Vietnamese text over telex systems, it is one of the most used input method on phones and touchscreens and also computers. Vietnamese Morse code uses the TELEX system. Other systems include VNI and VIQR.

VNI Software Company is a developer of various education, entertainment, office, and utility software packages. They are known for developing an encoding and a popular input method for Vietnamese on for computers. VNI is often available on computer systems to type Vietnamese, alongside TELEX input method as well. The most common pairing is the use of VNI on keyboard and computers, whilst TELEX is more common on phones or touchscreens.

VPSKeys is a freeware input method editor developed and distributed by the Vietnamese Professionals Society (VPS). One of the first input method editors for Vietnamese, it allows users to add accent marks to Vietnamese text on computers running Microsoft Windows. The first version of VPSKeys, supporting Windows 3.1, was released in 1993. The most recent version is 4.3, released in October 2007.

KPS 9566 is a North Korean standard specifying a character encoding for the Chosŏn'gŭl (Hangul) writing system used for the Korean language. The edition of 1997 specified an ISO 2022-compliant 94×94 two-byte coded character set. Subsequent editions have added additional encoded characters outside of the 94×94 plane, in a manner comparable to UHC or GBK.

<span class="mw-page-title-main">UniKey (software)</span>

UniKey is the most popular third-party software and input method editor (IME) for encoding Vietnamese for Windows. The core, UniKey Vietnamese Input Method, is also the engine imbedded in many Vietnamese software-based keyboards in Windows, Android, Linux, macOS and iOS. UniKey is free and the source code for the UniKey Vietnamese Input Method is distributed under GNU General Public License. The official website of UniKey is unikey.org, which supports both English and Vietnamese.

VSCII, also known as TCVN 5712, ISO-IR-180, .VN, ABC or simply the TCVN encodings, is a set of three closely related Vietnamese national standard character encodings for using the Vietnamese language with computers, developed by the TCVN Technical Committee on Information Technology (TCVN/TC1) and first adopted in 1993.

Chinese character IT is the information technology for computer processing of Chinese characters. While the English writing system uses a few dozen different characters, Chinese language needs a much larger character set. There are over ten thousand characters in the Xinhua Dictionary. In the Unicode multilingual character set of 149,813 characters, 98,682 are Chinese. That means computer processing of Chinese characters is the toughest among other languages.

References

  1. "How to type Hán Nôm characters?". winvnkey.sourceforge.net. Retrieved 2022-12-08.
  2. "Chu Nom Resources". chunom.org. Retrieved 2022-12-08.
  3. 1 2 Ngô Đình Học; Trần Tư Bình (July 21, 2014). "Express Manual for WinVNKey". WinVNKey. Retrieved October 5, 2014.
  4. ISO/IEC JTC1/SC2/WG2 (October 10, 2001). Unicode Consortium Liaison Report (Report). International Organization for Standardization. L2/01-378. Retrieved July 5, 2024.{{cite report}}: CS1 maint: numeric names: authors list (link)
  5. Whistler, Ken (August 1, 2001). Analysis of Character Deprecation in the Unicode Standard (Report). Unicode Technical Committee. L2/01-301. Retrieved July 5, 2024.
  6. "Combining Diacritical Marks". Unicode 7.0 Character Code Charts. Unicode Consortium. June 16, 2014. Retrieved October 5, 2014.
  7. Buff, Charlotte (September 16, 2018). Deprecation Inconsistencies in Code Chart Annotations (PDF) (Report). Unicode Technical Committee. L2/18-301. Retrieved July 5, 2024.
  8. Ngo, Hoc Dinh; Tran, TuBinh. "5. Why Having Vietnamese Charset (Character Set – Encoding) Conversion?". Some special functions of WinVNKey.
  9. 1 2 "Chọn Font chữ, bảng mã để gõ tiếng Việt". Bộ gõ tiếng Việt.Com (in Vietnamese). MangVN. 2009. Archived from the original on November 20, 2010.
  10. Lunde, Ken (2009). CJKV Information Processing (2nd ed.). O'Reilly Media. pp. 47–49. ISBN   978-0-596-51447-1 via Google Books.
  11. Sivonen, Henri (2014-09-26). "Character encoding changes in m-c require c-c action". mozilla.dev.apps.thunderbird.
  12. Hoàng Tô; Nguyễn Quan Sơn; Nguyễn Sơn Tùng; Phan Quang Minh; Phạm Thúc Trương Lương; Nguyễn Quang Hiệp; Bùi Văn Kiên; Nguyễn Ích Vinh (20 July 2014). Sử ký Tinh Vân: 20 năm sẻ chia và sáng tạo [History of Tinhvan: 20 years of sharing and creating] (in Vietnamese). Vol. 1. Tinhvan Group. p. 37 via Google Books.
  13. Trương, Donny. "Design Challenges". Vietnamese Typography. Retrieved April 10, 2018.
  14. See, for example: "Viết Thư". Vietnamese reading selections (in Vietnamese) (2 ed.). Army Language School. 1956. pp. 98–100.
  15. 1 2 3 4 5 "2. Review Of Current Conventions". Vietnamese Character Encoding Standardization Report - VISCII And VIQR 1.1 Character Encoding Specifications (Technical report). Viet-Std Group. 1992. p. 10.
  16. "Unicode & Vietnamese Legacy Character Encodings". Vietnamese Unicode FAQs. TCVN3 is not double-byte, but due to the nature of its encoding, capital letters (vowels) are mapped to a separate, capital font that is similar to the normal, lowercase one.
  17. Trần Văn Kiệm (2004). "phở". Giúp đọc Nôm và Hán Việt (in Vietnamese) (4th ed.).
  18. Nguyễn Quang Hồng. "Giới thiệu Kho chữ Hán Nôm mã hoá" [Hán Nôm Coded Character Repertoire Introduction] (in Vietnamese). Vietnamese Nôm Preservation Foundation.
  19. Lunde 2009, pp. 152–153.
  20. Collins, Lee; Ngô Thanh Nhàn (6 November 2017). "Proposal to Encode Two Vietnamese Alternate Reading Marks" (PDF).
  21. "Proposed New Characters: The Pipeline". Unicode Consortium. 8 May 2019. Retrieved 26 May 2019.
  22. "Nôm Font". Vietnamese Nôm Preservation Foundation. Retrieved October 5, 2014.
  23. Đỗ Quốc Bảo; Tô Minh Tâm; Thiền Viện Viên Chiếu (December 8, 2005). "UNICODE Han Nom Font Set" . Retrieved October 5, 2014.
  24. Lunde 2009, p. 529.
  25. Duncan, John William (2005-12-22), VietNamese Typewriter , retrieved 2020-07-11
  26. Jacob, Raquel (February 2, 2022). "Language Guidelines – Vietnamese". Unbabel . Retrieved July 18, 2022.

Further reading