Vietnamese Quoted-Readable

Last updated

Vietnamese Quoted-Readable (usually abbreviated VIQR), also known as Vietnet, is a convention for writing Vietnamese using ASCII characters encoded in only 7 bits, making possible for Vietnamese to be supported in computing and communication systems at the time. Because the Vietnamese alphabet contains a complex system of diacritical marks, VIQR requires the user to type in a base letter, followed by one or two characters that represent the diacritical marks.

Contents

Syntax

VIQR uses the following convention: [1]

Diacritical marks in VIQR
Diacritical markTyped characterExamples
trăng (breve)(a(ă
(circumflex)^a^â
móc (horn)+ [2] o+ơ
huyền (grave)`a`à
sắc (acute)' [3] a'á
hỏi (hook)?a?
ngã (tilde)~a~ã
nặng (dot below).a.

VIQR uses DD or Dd for the Vietnamese letter Đ, and dd for the Vietnamese letter đ. To type certain punctuation marks (namely, the period, question mark, apostrophe, forward slash, opening parenthesis, or tilde) directly after most Vietnamese words, a backslash (\) must be typed directly before the punctuation mark, functioning as an escape character, so that it will not be interpreted as a diacritical mark. For example:

O^ng te^n gi`\? To^i te^n la` Tra^`n Va(n Hie^'u\.
Ông tên gì? Tôi tên là Trần Văn Hiếu.
What is your name [Sir]? My name is Trần Văn Hiếu.

Software support

VIQR is primarily used as a Vietnamese input method in software that supports Unicode. Similar input methods include Telex and VNI. Input method editors such as VPSKeys convert VIQR sequences to Unicode precomposed characters as one types, typically allowing modifier keys to be input after all the base letters of each word. However, in the absence of input method software or Unicode support, VIQR can still be input using a standard keyboard and read as plain ASCII text without suffering from mojibake .

Unlike the VISCII and VPS code pages, VIQR is rarely used as a character encoding. While VIQR is registered with the Internet Assigned Numbers Authority as a MIME charset, MIME-compliant software is not required to support it. [4] Nevertheless, the Mozilla Vietnamese Enabling Project once produced builds of the open source version of Netscape Communicator, as well as its successor, the Mozilla Application Suite, that were capable of decoding VIQR-encoded webpages, e-mails, and newsgroup messages. In these unofficial builds, a "VIQR" option appears in the Edit | Character Set menu, alongside the VISCII, TCVN 5712, VPS, and Windows-1258 options that remained available for several years in Mozilla Firefox and Thunderbird. [5] [6]

History

By the early 1990s, an ad-hoc system of mnemonics known as Vietnet was in use on the Viet-Net mailing list and soc.culture.vietnamese Usenet group. [7] [8]

In 1992, the Vietnamese Standardization Group (Viet-Std, Nhóm Nghiên Cứu Tiêu Chuẩn Tiếng Việt) from the TriChlor Software Group led by Christopher Cuong T. Nguyen, Cuong M. Bui, and Hoc D. Ngo in California formalized the VIQR convention. It was described the next year in RFC 1456.

See also

Alternative schemes for Vietnamese:

ASCII mnemonics for other writing systems:

Notes and references

  1. 1 2 Lunde, Ken (2009). CJKV Information Processing (2nd ed.). O'Reilly Media. pp. 47–49. ISBN   978-0-596-51447-1 via Google Books.
  2. Some software also supports * for inserting a horn diacritic.
  3. Some software also supports / for inserting an acute diacritic.
  4. RFC 1456.
  5. "Mozilla Vietnamese Enabling Project". Văn Lang Vietnamese Language & Culture Education Center.
  6. Van D. Ho (29 October 2009). "Vietnamese Unicode Font" . Retrieved 16 October 2013.
  7. "A Unified Framework for Vietnamese Information Processing" (in English and Vietnamese). Vietnamese Standardization Working Group. September 1992. Retrieved 16 October 2013.
  8. Trần Tư Bình (14 August 2008). "Bài 4 – Thử tìm kiểu gõ dấu chữ Việt nhanh nhất". Chim Việt Cành Nam (in Vietnamese). No. 32. Retrieved 16 October 2013.

Related Research Articles

<span class="mw-page-title-main">ASCII</span> American character encoding standard

ASCII, an acronym for American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of technical limitations of computer systems at the time it was invented, ASCII has just 128 code points, of which only 95 are printable characters, which severely limited its scope. Modern computer systems have evolved to use Unicode, which has millions of code points, but the first 128 of these are the same as the ASCII set.

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text written in all of the world's major writing systems. Version 15.1 of the standard defines 149813 characters and 161 scripts used in various ordinary, literary, academic, and technical contexts.

UTF-8 is a variable-length character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation Format – 8-bit.

The tilde˜ or ~, is a grapheme with a number of uses. The name of the character came into English from Spanish, which in turn came from the Latin titulus, meaning 'title' or 'superscription'. Its primary use is as a diacritic (accent) in combination with a base letter; but, for historical reasons, it is also used in standalone form within a variety of contexts.

Punycode is a representation of Unicode with the limited ASCII character subset used for Internet hostnames. Using Punycode, host names containing Unicode characters are transcoded to a subset of ASCII consisting of letters, digits, and hyphens, which is called the letter–digit–hyphen (LDH) subset. For example, München is encoded as Mnchen-3ya.

The Vietnamese alphabet is the modern writing script for Vietnamese. It uses the Latin script based on Romance languages originally developed by Portuguese missionary Francisco de Pina (1585–1625).

The Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) is a variant of SAMPA developed in 1995 by John C. Wells, professor of phonetics at University College London. It is designed to unify the individual language SAMPA alphabets, and extend SAMPA to cover the entire range of characters in the 1993 version of International Phonetic Alphabet (IPA). The result is a SAMPA-inspired remapping of the IPA into 7-bit ASCII.

In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks.

VISCII is an unofficially-defined modified ASCII character encoding for using the Vietnamese language with computers. It should not be confused with the similarly-named officially registered VSCII encoding. VISCII keeps the 95 printable characters of ASCII unmodified, but it replaces 6 of the 33 control characters with printable characters. It adds 128 precomposed characters. Unicode and the Windows-1258 code page are now used for virtually all Vietnamese computer data, but legacy VSCII and VISCII files may need conversion.

Windows-1258 is a code page used in Microsoft Windows to represent Vietnamese texts. It makes use of combining diacritical marks.

Telex or TELEX, is a convention for encoding Vietnamese text in plain ASCII characters. Originally used for transmitting Vietnamese text over telex systems, it is one of the most used input method on phones and touchscreens and also computers. Vietnamese Morse code uses the TELEX system. Other systems include VNI and VIQR.

VNI Software Company is a developer of various education, entertainment, office, and utility software packages. They are known for developing an encoding and a popular input method for Vietnamese on for computers. VNI is often available on computer systems to type Vietnamese, alongside TELEX input method as well. The most common pairing is the use of VNI on keyboard and computers, whilst TELEX is more common on phones or touchscreens.

VPSKeys is a freeware input method editor developed and distributed by the Vietnamese Professionals Society (VPS). One of the first input method editors for Vietnamese, it allows users to add accent marks to Vietnamese text on computers running Microsoft Windows. The first version of VPSKeys, supporting Windows 3.1, was released in 1993. The most recent version is 4.3, released in October 2007.

<span class="mw-page-title-main">Unicode input</span> Input characters using their Unicode code points

Unicode input is the insertion of a specific Unicode character on a computer by a user; it is a common way to input characters not directly supported by a physical keyboard. Unicode characters can be produced either by selecting them from a display or by typing a certain sequence of keys on a physical keyboard. In addition, a character produced by one of these methods in one web page or document can be copied into another. In contrast to ASCII's 96 element character set, Unicode encodes hundreds of thousands of graphemes (characters) from almost all of the world's written languages and many other signs and symbols besides.

The ISO basic Latin alphabet is an international standard for a Latin-script alphabet that consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the current English alphabet. Since medieval times, they are also the same letters of the modern Latin alphabet. The order is also important for sorting words into alphabetical order.

The programming language APL uses a number of symbols, rather than words from natural language, to identify operations, similarly to mathematical symbols. Prior to the wide adoption of Unicode, a number of special-purpose EBCDIC and non-EBCDIC code pages were used to represent the symbols required for writing APL.

The Vietnamese language is written with a Latin script with diacritics which requires several accommodations when typing on phone or computers. Software-based systems are a form of writing Vietnamese on phones or computers with software that can be installed on the device or from third-party software such as UniKey. Telex is the oldest input method devised to encode the Vietnamese language with its tones. Other input methods may also include VNI and VIQR. VNI input method is not to be confused with VNI code page.

<span class="mw-page-title-main">UniKey (software)</span>

UniKey is the most popular third-party software and input method editor (IME) for encoding Vietnamese for Windows. The core, UniKey Vietnamese Input Method, is also the engine imbedded in many Vietnamese software-based keyboards in Windows, Android, Linux, macOS and iOS. UniKey is free and the source code for the UniKey Vietnamese Input Method is distributed under GNU General Public License. The official website of UniKey is unikey.org, which supports both English and Vietnamese.

VSCII, also known as TCVN 5712, ISO-IR-180, .VN, ABC or simply the TCVN encodings, is a set of three closely related Vietnamese national standard character encodings for using the Vietnamese language with computers, developed by the TCVN Technical Committee on Information Technology (TCVN/TC1) and first adopted in 1993.

VNLabs or VN Labs is a software company based in San Jose, California, that specializes in input methods for various languages.