MSX character set

Last updated
MSX character set
Language(s)Arabic, Portuguese, German, English, Japanese, Korean, Russian
Created by Microsoft
Based on code page 437

MSX character sets are a group of single- and double-byte character sets developed by Microsoft for MSX computers. They are based on code page 437.

Contents

Character sets

The following table shows the MSX character set. Each character is shown with a potential Unicode equivalent if available. Control characters and other non-printing characters are represented by their names.

Character set differences exist, depending on the target market of the machine. These are the variations:

The German DIN and International character sets are identical, apart from the style of zero (0) character. The international character set has a zero with a slash, while the DIN character set has a dotted zero.

The MSX terminal is compatible with VT52 escape codes, plus extra control codes shown below.

MSX International [1] [2]
0123456789ABCDEF
0x NULL graph WB [lower-alpha 1] ceol [lower-alpha 2] WF [lower-alpha 3] BEEP BS TAB LF home [lower-alpha 4] CLS RET eol [lower-alpha 5]
1xINS [lower-alpha 6] DL [lower-alpha 7] select [lower-alpha 8] ESC [lower-alpha 9] [lower-alpha 10] [lower-alpha 11] [lower-alpha 12]
2x  SP   ! " # $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~ DEL
8x Ç ü é â ä à å ç ê ë è ï î ì Ä Å
9x É æ Æ ô ö ò û ù ÿ Ö Ü ¢ £ ¥ ƒ
Ax á í ó ú ñ Ñ ª º ¿ ¬ ½ ¼ ¡ « »
Bx à ã Ĩ ĩ Õ [lower-alpha 13] õ [lower-alpha 14] Ű ű IJ ij ¾ §
Cx 🮂 🮅 🮇 🮊 🮙 🮘 🭭 🭯 🭬
Dx 🭮 🮚 🮛 🮖 Δ ω
Ex α ß Γ π Σ σ µ τ Φ Θ Ω δ
Fx ± ÷ ° · ² cursor
  1. moves the cursor to the previous word
  2. deletes the line to the right of the cursor
  3. moves the cursor to the next word
  4. places the cursor at top left of the screen
  5. moves the cursor to the end of the line
  6. insert key
  7. deletes the line where the cursor is located
  8. Special key. Its function can vary amongst applications
  9. moves the cursor one character to the right
  10. moves the cursor one character to the left
  11. moves the cursor up
  12. moves the cursor down
  13. could also be Ő
  14. could also be ő
MSX International [1] [2]
0123456789ABCDEF
4x NBSP
5x 🮯

Brazilian variants

Gradiente custom charset

The Brazilian manufacturer Gradiente have initially included a modified MSX character set on their v1.0 machines to allow writing correct Portuguese. Differences are shown boxed. The symbol at 0x9E (158) is the currency symbol for the Brazilian cruzado which is not used anymore.

MSX Brazilian
0123456789ABCDEF
8x Ç ü é â Á à ¨ ç ê Í Ó Ú Â Ê Ô À
9x É æ Æ ô ö ò û ù ÿ Ö Ü ¢ £ ¥ Cz ƒ

BRASCII

Later Brazilian MSX models (v1.1 or higher) included a standardized character set named BRASCII, which solved the accentuation incompatibility problems amongst the different makers.

Related Research Articles

<span class="mw-page-title-main">ASCII</span> American character encoding standard

ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of technical limitations of computer systems at the time it was invented, ASCII has just 128 code points, of which only 95 are printable characters, which severely limited its scope. Modern computer systems have evolved to use Unicode, which has millions of code points, but the first 128 of these are the same as the ASCII set.

In computing and telecommunication, a control character or non-printing character (NPC) is a code point in a character set that does not represent a written character or symbol. They are used as in-band signaling to cause effects other than the addition of a symbol to the text. All other characters are mainly graphic characters, also known as printing characters, except perhaps for "space" characters. In the ASCII standard there are 33 control characters, such as code 7, BEL, which rings a terminal bell.

Ø is a letter used in the Danish, Norwegian, Faroese, and Southern Sámi languages. It is mostly used as to represent the mid front rounded vowels, such as and, except for Southern Sámi where it is used as an diphthong.

<span class="mw-page-title-main">Control key</span> Key on computer keyboards

In computing, a Control keyCtrl is a modifier key which, when pressed in conjunction with another key, performs a special operation. Similarly to the Shift key, the Control key rarely performs any function when pressed by itself. The Control key is located on or near the bottom left side of most keyboards, with many featuring an additional one at the bottom right.

<span class="mw-page-title-main">Tab key</span> Key on a keyboard for tabulation

The tab keyTab ↹ on a keyboard is used to advance the cursor to the next tab stop.

<span class="mw-page-title-main">ATASCII</span> Variation of the ASCII character set, used in the Atari 8-bit family of home computers

The ATASCII character set, from ATARI Standard Code for Information Interchange, alternatively ATARI ASCII, is the variation on ASCII used in the Atari 8-bit family of home computers. The first of this family are the Atari 400 and 800, released in 1979, and later models were released throughout the 1980s. The last computer to use the ATASCII character set is the Atari XEGS which was released in 1987 and discontinued in 1992. The Atari ST family of computers use the different Atari ST character set.

A telegraph code is one of the character encodings used to transmit information by telegraphy. Morse code is the best-known such code. Telegraphy usually refers to the electrical telegraph, but telegraph systems using the optical telegraph were in use before that. A code consists of a number of code points, each corresponding to a letter of the alphabet, a numeral, or some other character. In codes intended for machines rather than humans, code points for control characters, such as carriage return, are required to control the operation of the mechanism. Each code point is made up of a number of elements arranged in a unique way for that character. There are usually two types of element, but more element types were employed in some codes not intended for machines. For instance, American Morse code had about five elements, rather than the two of International Morse Code.

<span class="mw-page-title-main">Alt key</span> Computer key

The Alt keyAlt on a computer keyboard is used to change (alternate) the function of other pressed keys. Thus, the Alt key is a modifier key, used in a similar fashion to the Shift key. For example, simply pressing A will type the letter 'a', but holding down the Alt key while pressing A will cause the computer to perform an Alt+A function, which varies from program to program. The international standard ISO/IEC 9995-2 calls it Alternate key. The key is located on either side of the space bar, but in non-US PC keyboard layouts, rather than a second Alt key, there is an 'Alt Gr' key to the right of the space bar. Both placements are in accordance with ISO/IEC 9995-2. With some keyboard mappings, the right Alt key can be reconfigured to function as an AltGr key although not engraved as such.

<span class="mw-page-title-main">Backspace</span> Key on a keyboard

Backspace is the keyboard key that originally pushed the typewriter carriage one position backwards and in modern computer systems moves the display cursor one position backwards, removes the character at that position, and shifts back the cursor back by one position.

<span class="mw-page-title-main">Code page 437</span> Character set of the original IBM PC

Code page 437 is the character set of the original IBM PC. It is also known as CP437, OEM-US, OEM 437, PC-8, or DOS Latin US. The set includes all printable ASCII characters as well as some accented letters (diacritics), Greek letters, icons, and line-drawing symbols. It is sometimes referred to as the "OEM font" or "high ASCII", or as "extended ASCII".

<span class="mw-page-title-main">Slashed zero</span> Glyph variant of numeral 0 (zero) with slash

The slashed zero 0̷ is a representation of the Arabic digit "0" (zero) with a slash through it. The slashed zero glyph is often used to distinguish the digit "zero" ("0") from the Latin script letter "O" anywhere that the distinction needs emphasis, particularly in encoding systems, scientific and engineering applications, computer programming, and telecommunications. It thus helps to differentiate characters that would otherwise be homoglyphs. It was commonly used during the punch card era, when programs were typically written out by hand, to avoid ambiguity when the character was later typed on a card punch.

The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.

In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area on a page. For example, the common whitespace symbol U+0020 SPACE represents a blank space punctuation character in text, used as a word divider in Western scripts.

The delete control character is the last character in the ASCII repertoire, with the code 127. It is supposed to do nothing and was designed to erase incorrect characters on paper tape. It is denoted as ^? in caret notation and is U+007F in Unicode.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

<span class="mw-page-title-main">Unicode input</span> Input characters using their Unicode code points

Unicode input is the insertion of a specific Unicode character on a computer by a user; it is a common way to input characters not directly supported by a physical keyboard. Unicode characters can be produced either by selecting them from a display or by typing a certain sequence of keys on a physical keyboard. In addition, a character produced by one of these methods in one web page or document can be copied into another. In contrast to ASCII's 96 element character set, Unicode encodes hundreds of thousands of graphemes (characters) from almost all of the world's written languages and many other signs and symbols besides.

<span class="mw-page-title-main">Delete key</span> Computer keyboard key

The delete key is a button on most computer keyboards which is typically used to delete either the character ahead of or beneath the cursor, or the currently-selected object. The key is sometimes referred to as the "forward delete" key. This is because the backspace key also deletes characters, but to the left of the cursor. On many keyboards, such as most Apple keyboards, the key with the backspace function is also labelled "delete".

<span class="mw-page-title-main">Atari ST character set</span> Character set of the Atari ST personal computer family

The Atari ST character set is the character set of the Atari ST personal computer family including the Atari STE, TT and Falcon. It is based on code page 437, the original character set of the IBM PC, and like that set includes ASCII codes 32–126, extended codes for accented letters (diacritics), and other symbols. It differs from code page 437 in using other dingbats at code points 0–31, in exchanging the box-drawing characters 176–223 for the Hebrew alphabet and other symbols, and exchanging code points 158, 236 and 254–255 with the symbols for sharp S, line integral, cubed and macron.

Sharp MZ character sets are character sets made by Sharp Corporation for Sharp MZ computers. The European and Japanese versions of the software use different character sets.

References

  1. 1 2 "MSX.TXT" (PDF), L2/19-025: Proposal to add characters from legacy computers and teletext to the UCS, 2019-01-04
  2. 1 2 Rderooy; Tvalenca; Gdx (2016-12-16). "MSX font". Microcomputer & Related Culture Foundation. Archived from the original on 2017-07-24. Retrieved 2017-07-24.