VSCII

Last updated
VSCII
Alias(es)x-viet-tcvn5712 [1]
Language(s) Vietnamese, English
Created by TCVN/TC1
StandardTCVN 5712:1993
Classification8-bit SBCS;
Extended ASCII (VSCII-2/-3)

VSCII (Vietnamese Standard Code for Information Interchange), also known as TCVN 5712, [2] ISO-IR-180, [3] .VN, [4] ABC [4] or simply the TCVN encodings, [4] [5] is a set of three closely related Vietnamese national standard character encodings for using the Vietnamese language with computers, developed by the TCVN Technical Committee on Information Technology (TCVN/TC1) and first adopted in 1993 (as TCVN 5712:1993). [2]

Contents

It should not be confused with the similarly-named unofficial VISCII encoding, which was sometimes used by overseas Vietnamese speakers. [4] VISCII was also intended to stand for Vietnamese Standard Code for Information Interchange, but is not related to VSCII. [6]

VSCII (TCVN) was used extensively in the north of Vietnam, while VNI was popular in the south. [4] Unicode and the Windows-1258 code page are now used for virtually all Vietnamese computer data,[ citation needed ] but legacy files or archived messages may need conversion.

Encodings

All three forms of VSCII keep the 95 printable characters of ASCII unmodified.

VSCII-3, also known as TCVN 5712-3, VN3 or simply TCVN3, [7] includes the fewest assignments. It is an extended ASCII, because it keeps all 128 codes of ASCII unmodified. It does not reassign any of the C0 and C1 control codes. Compared to ASCII, it adds 75 characters:

Tone marks on uppercase vowels is accomplished in TCVN3 by switching to an all-capital font. [8]

VSCII-2, also known as TCVN 5712-2 and VN2, is a superset of VSCII-3. It is an extended ASCII, because it keeps all 128 codes of ASCII unmodified. It does not reassign any of the C0 and C1 control codes, making it conformant with ISO 2022 as a 96-set. [2] [3] Compared to VSCII-3, it adds (for a total of 96 non-ASCII characters):

VSCII-1, also known as TCVN 5712-1 and VN1, is an extension of VSCII-2, and is a modified ASCII, since it replaces 12 of the 33 control characters with precomposed characters. Compared to VSCII-2, it (for a total of 140 non-ASCII characters):

Conversion from VSCII-3 to VSCII-2 or VSCII-1 and conversion from VSCII-2 to VSCII-1 are not necessary, but can result in smaller files.

Conversion from VSCII-1 to VSCII-2 or VSCII-3 and conversion from VSCII-2 to VSCII-3 require expansion of some pre-composed characters.

Character set

VSCII-1 [2]
0123456789ABCDEF
0x NUL Ú
00DA

1EE4
ETX
1EEA

1EEC

1EEE
BEL BS HT LF VT FF CR SO SI
1x DLE
1EE8

1EF0

1EF2

1EF6

1EF8
Ý
00DD

1EF4
CAN EM SUB ESC FS GS RS US
2x  SP   ! " # $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~ DEL
8x À
00C0

1EA2
Ã
00C3
Á
00C1

1EA0

1EB6

1EAC
È
00C8

1EBA

1EBC
É
00C9

1EB8

1EC6
Ì
00CC

1EC8
Ĩ
0128
9x Í
00CD

1ECA
Ò
00D2

1ECE
Õ
00D5
Ó
00D3

1ECC

1ED8

1EDC

1EDE

1EE0

1EDA

1EE2
Ù
00D9

1EE6
Ũ
0168
Ax NBSP Ă
0102
Â
00C2
Ê
00CA
Ô
00D4
Ơ
01A0
Ư
01AF
Đ
0110
ă
0103
â
00E2
ê
00EA
ô
00F4
ơ
01A1
ư
01B0
đ
0111

1EB0
Bx ̀
0300
̉
0309
̃
0303
́
0301
̣
0323
à
00E0

1EA3
ã
00E3
á
00E1

1EA1

1EB2

1EB1

1EB3

1EB5

1EAF

1EB4
Cx
1EAE

1EA6

1EA8

1EAA

1EA4

1EC0

1EB7

1EA7

1EA9

1EAB

1EA5

1EAD
è
00E8

1EC2

1EBB

1EBD
Dx é
00E9

1EB9

1EC1

1EC3

1EC5
ế
1EBF

1EC7
ì
00EC

1EC9

1EC4

1EBE

1ED2
ĩ
0129
í
00ED

1ECB
ò
00F2
Ex
1ED4

1ECF
õ
00F5
ó
00F3

1ECD

1ED3

1ED5

1ED7

1ED1

1ED9

1EDD

1EDF

1EE1

1EDB

1EE3
ù
00F9
Fx
1ED6

1EE7
ũ
0169
ú
00FA

1EE5

1EEB

1EED

1EEF

1EE9

1EF1

1EF3

1EF7

1EF9
ý
1EF5

1ED0
  VSCII-3
  Additions for VSCII-2
  Additions for VSCII-1 [9]

Related Research Articles

Character encoding Using numbers to represent text characters

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that make up a character encoding are known as "code points" and collectively comprise a "code space", a "code page", or a "character map".

While Hypertext Markup Language (HTML) has been in use since 1991, HTML 4.0 from December 1997 was the first standardized version where international characters were given reasonably complete treatment. When an HTML document includes special characters outside the range of seven-bit ASCII, two goals are worth considering: the information's integrity, and universal browser display.

Extended Binary Coded Decimal Interchange Code is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding six-bit binary-coded decimal code used with most of IBM's computer peripherals of the late 1950s and early 1960s. It is supported by various non-IBM platforms, such as Fujitsu-Siemens' BS2000/OSD, OS-IV, MSP, and MSP-EX, the SDS Sigma series, Unisys VS/9, Unisys MCP and ICL VME.

Mojibake Garbled text as a result of incorrect character encoding

Mojibake is the garbled text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system.

ISO/IEC 646 is the name of a set of ISO standards, described as Information technology — ISO 7-bit coded character set for information interchange and developed in cooperation with ASCII at least since 1964. Since its first edition in 1967 it has specified a 7-bit character code from which several national standards are derived.

In computing, Chinese character encodings can be used to represent text written in the CJK languages—Chinese, Japanese, Korean—and (rarely) obsolete Vietnamese, all of which use Chinese characters. Several general-purpose character encodings accommodate Chinese characters, and some of them were developed specifically for Chinese.

VISCII is an unofficially-defined modified ASCII character encoding for using the Vietnamese language with computers. It should not be confused with the similarly-named officially registered VSCII encoding. VISCII keeps the 95 printable characters of ASCII unmodified, but it replaces 6 of the 33 control characters with printable characters. It adds 128 precomposed characters. Unicode and the Windows-1258 code page are now used for virtually all Vietnamese computer data, but legacy VSCII and VISCII files may need conversion.

Windows-1258 is a code page used in Microsoft Windows to represent Vietnamese texts. It makes use of combining diacritical marks.

Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese.

Vietnamese Quoted-Readable, also known as Vietnet, is a convention for writing Vietnamese using ASCII characters encoded in only 7 bits, making possible for Vietnamese to be supported in computing and communication systems at the time. Because the Vietnamese alphabet contains a complex system of diacritical marks, VIQR requires the user to type in a base letter, followed by one or two characters that represent the diacritical marks.

KOI (КОИ) is a family of several code pages for the Cyrillic script. The name stands for Kod obmena informatsiey which means "Code for Information Interchange".

Windows code pages are sets of characters or code pages used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Windows, although they are still supported both within Windows and other platforms, and still apply when Alt code shortcuts are used.

MIK (МИК) is an 8-bit Cyrillic code page used with DOS. It is based on the character set used in the Bulgarian Pravetz 16 IBM PC compatible system. Kermit calls this character set "BULGARIA-PC" / "bulgaria-pc". In Bulgaria, it was sometimes incorrectly referred to as code page 856. This code page is known by FreeDOS as Code page 3021.

Telex or TELEX, is a convention for encoding Vietnamese text in plain ASCII characters. Originally used for transmitting Vietnamese text over telex systems, it is one of the most used input method on phones and touchscreens and also computers. Vietnamese Morse code uses the TELEX system. Other systems include VNI and VIQR.

VNI Software Company is a developer of various education, entertainment, office, and utility software packages. They are known for developing an encoding and a popular input method for Vietnamese on for computers. VNI is often available on computer systems to type Vietnamese, alongside TELEX input method as well. The most common pairing is the use of VNI on keyboard and computers, whilst TELEX is more common on phones or touchscreens.

VPSKeys is a freeware input method editor developed and distributed by the Vietnamese Professionals Society (VPS). One of the first input method editors for Vietnamese, it allows users to add accent marks to Vietnamese text on computers running Microsoft Windows. The first version of VPSKeys, supporting Windows 3.1, was released in 1993. The most recent version is 4.3, released in October 2007.

The ISO basic Latin alphabet is a Latin-script alphabet and consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the English alphabet.

The Vietnamese language is written with a Latin script with diacritics which requires several accommodations when typing on phone or computers. Software-based systems are a form of writing Vietnamese on phones or computers with software that can be installed on the device or from third party software such as UniKey. Telex is the oldest input method devised to encode the Vietnamese language with its tones. Other input methods may also include VNI and VIQR. VNI input method is not to be confused with VNI code page.

KS X 1002 is a South Korean character set standard that is established in order to supplement KS X 1001. It consists of a total of 7,649 characters.

Several mutually incompatible versions of the Extended Binary Coded Decimal Interchange Code (EBCDIC) have been used to represent the Japanese language on computers, including variants defined by Hitachi, Fujitsu, IBM and others. Some are variable-width encodings, employing locking shift codes to switch between single-byte and double-byte modes. Unlike other EBCDIC locales, the lowercase basic Latin letters are often not preserved in their usual locations.

References

  1. Sivonen, Henri (2014-09-26). "Character encoding changes in m-c require c-c action". mozilla.dev.apps.thunderbird.
  2. 1 2 3 4 5 "[news] TCVN 5712:1993 (VSCII) -- Vietnamese national standard". 1993-06-02. Archived from the original on 2017-01-11.
  3. 1 2 TCVN (1993). ISO-IR-180: Right-hand Part of the VSCII-2 Code Table (PDF). ITSCJ/IPSJ.
  4. 1 2 3 4 5 Ngo, Hoc Dinh; Tran, TuBinh. "5. Why Having Vietnamese Charset (Character Set – Encoding) Conversion?". Some special functions of WinVNKey.
  5. Nguyen, Minh T. "Vietnamese Conversions (Vietnet/VIQR, VNI, VPS, VISCII, VNU, TCVN, VietWare, unicode)".
  6. Lunde, Ken (13 January 2009). "Chapter 1: CJKV Information Processing Overview (§ Are VISCII and VSCII identical? What about TCVN?)". CJKV Information Processing (2nd ed.). p. 17. ISBN   978-0-596-51447-1.
  7. "Unicode & Vietnamese Legacy Character Encodings". Vietnamese Unicode FAQs.
  8. "Unicode & Vietnamese Legacy Character Encodings". Vietnamese Unicode FAQs. TCVN3 is not double-byte, but due to the nature of its encoding, capital letters (vowels) are mapped to a separate, capital font that is similar to the normal, lowercase one.
  9. Lunde, Ken (13 January 2009). "Appendix L: Vietnamese Character Sets" (PDF). CJKV Information Processing (2nd ed.). ISBN   978-0-596-51447-1.