Box-drawing characters

Last updated
Midnight Commander using box-drawing characters in a terminal emulator Midnight Commander screenshot.png
Midnight Commander using box-drawing characters in a terminal emulator

Box-drawing characters, also known as line-drawing characters, are a form of semigraphics widely used in text user interfaces to draw various geometric frames and boxes. These characters are characterized by being designed to be connected horizontally and/or vertically with adjacent characters, which requires proper alignment. Box-drawing characters therefore typically only work well with monospaced fonts.

Contents

In graphical user interfaces, these characters are much less useful as it is more simple and appropriate to draw lines and rectangles directly with graphical APIs. However, they are still useful for command-line interfaces and plaintext comments within source code.

Some recent embedded systems also use proprietary character sets, usually extensions to ISO 8859 character sets, which include box-drawing characters or other special symbols.

Other types of box-drawing characters are block elements, shade characters, and terminal graphic characters; these can be used for filling regions of the screen and portraying drop shadows.

Unicode

Box Drawing

Unicode includes 128 such characters in the Box Drawing block. [1] In many Unicode fonts, only the subset that is also available in the IBM PC character set (see below) will exist, due to it being defined as part of the WGL4 character set.

Box Drawing [1]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+250x
U+251x
U+252x
U+253x
U+254x
U+255x
U+256x
U+257x
Notes
1. ^ As of Unicode version 15.1

The image below is provided as a quick reference for these symbols on systems that are unable to display them directly:

Unicode Box Drawings (2500 - 27FF).svg

Block Elements

The Block Elements Unicode block includes shading characters. 32 characters are included in the block.

Block Elements [1]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+258x
U+259x
Notes
1. ^ As of Unicode version 15.1

Symbols for Legacy Computing

In version 13.0, Unicode was extended with another block containing many graphics characters, Symbols for Legacy Computing, which includes a few box-drawing characters and other symbols used by obsolete operating systems (mostly from the 1980s). Few fonts support these characters (one is Noto Sans Symbols 2), but the table of symbols is provided here:

Symbols for Legacy Computing [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+1FB0x🬀🬁🬂🬃🬄🬅🬆🬇🬈🬉🬊🬋🬌🬍🬎🬏
U+1FB1x🬐🬑🬒🬓🬔🬕🬖🬗🬘🬙🬚🬛🬜🬝🬞🬟
U+1FB2x🬠🬡🬢🬣🬤🬥🬦🬧🬨🬩🬪🬫🬬🬭🬮🬯
U+1FB3x🬰🬱🬲🬳🬴🬵🬶🬷🬸🬹🬺🬻🬼🬽🬾🬿
U+1FB4x🭀🭁🭂🭃🭄🭅🭆🭇🭈🭉🭊🭋🭌🭍🭎🭏
U+1FB5x🭐🭑🭒🭓🭔🭕🭖🭗🭘🭙🭚🭛🭜🭝🭞🭟
U+1FB6x🭠🭡🭢🭣🭤🭥🭦🭧🭨🭩🭪🭫🭬🭭🭮🭯
U+1FB7x🭰🭱🭲🭳🭴🭵🭶🭷🭸🭹🭺🭻🭼🭽🭾🭿
U+1FB8x🮀🮁🮂🮃🮄🮅🮆🮇🮈🮉🮊🮋🮌🮍🮎🮏
U+1FB9x🮐🮑🮒🮔🮕🮖🮗🮘🮙🮚🮛🮜🮝🮞🮟
U+1FBAx🮠🮡🮢🮣🮤🮥🮦🮧🮨🮩🮪🮫🮬🮭🮮🮯
U+1FBBx🮰🮱🮲🮳🮴🮵🮶🮷🮸🮹🮺🮻🮼🮽🮾🮿
U+1FBCx🯀🯁🯂🯃🯄🯅🯆🯇🯈🯉🯊
U+1FBDx 
U+1FBEx 
U+1FBFx🯰🯱🯲🯳🯴🯵🯶🯷🯸🯹
Notes
1. ^ As of Unicode version 15.1
2. ^ Grey areas indicate non-assigned code points

The image below is provided as a quick reference for these symbols on systems that are unable to display them directly:

Symbols for Legacy Computing Unicode block.png

Platform-specific

Various different platforms defined their own unique set of box-drawing characters.

DOS

The hardware code page of the original IBM PC supplied the following box-drawing characters, in what DOS now calls code page 437. This subset of the Unicode box-drawing characters is thus included in WGL4 and is far more popular and likely to be rendered correctly:

0123456789ABCDEF
B
C
D

The integral halves are also box drawing as they are used alongside 0xB3:

45
F

Their number is further limited to 28 on those code pages that replace the 18 characters that combine single and double lines, the left and right half blocks, as well as integral halves with other, usually alphabetic, characters (such as code page 850):

0123456789ABCDEF
B
C
D

Note: The non-double characters are the thin (light) characters (U+2500, U+2502), not the bold (heavy) characters (U+2501, U+2503).

Some OEM DOS computers supported other character sets, for example the Hewlett-Packard HP 110  / HP Portable and HP 110 Plus  / HP Portable Plus, where in a modified version of the character set box-drawing characters were added in reserved areas of their normal HP Roman-8 character set. [2] [3]

[2] [3] 0123456789ABCDEF
8
9

Unix, CP/M, BBS

On many Unix systems and early dial-up bulletin board systems the only common standard for box-drawing characters was the VT100 alternate character set (see also: DEC Special Graphics). The escape sequence Esc ( 0 switched the codes for lower-case ASCII letters to draw this set, and the sequence Esc ( B switched back:

0123456789ABCDEF
6
7

On some terminals, these characters are not available at all, and the complexity of the escape sequences discouraged their use, so often only ASCII characters that approximate box-drawing characters are used, such as - (hyphen-minus), | (vertical bar), _ (underscore), = (equal sign) and + (plus sign) in a kind of ASCII art fashion.

Modern Unix terminal emulators use Unicode and thus have access to the line-drawing characters listed above.

Teletext

The World System Teletext (WST) uses pixel-drawing characters for some graphics. A character cell is divided in 2×3 regions, and 26 = 64 code positions are allocated for all possible combinations of pixels. [4] These characters were added to the Unicode standard in Version 13. [5]

Historical

Many microcomputers of the 1970s and 1980s had their own proprietary character sets, which also included box-drawing characters. Many of these were added to Unicode as Symbols for Legacy Computing.

Commodore

Commodore machines, such as the Commodore PET and the Commodore 64, included a set of text semigraphics with block elements and dithering patterns in the PETSCII character set.

PET 2001 keyboard layout, illustrating PETSCII graphics characters PET Keyboard.svg
PET 2001 keyboard layout, illustrating PETSCII graphics characters

Sinclair

ZX81 semigraphics ZX81.chars.00-0A.80-8A.png
ZX81 semigraphics

The Sinclair ZX80, ZX81, and ZX Spectrum included a set of text semigraphics with quadrant-based block elements. The ZX80 and ZX81 also included a set of text semigraphics with dithering patterns.

BBC and Acorn

The BBC Micro could utilize the Teletext 7-bit character set, which had 128 box-drawing characters, whose code points were shared with the regular alphanumeric and punctuation characters. Control characters were used to switch between regular text and box drawing. [6]

Teletext G1 Block Mosaics Set [7]
0123456789ABCDEF
2 TRS-80 character 0x80.png TRS-80 character 0x81.png TRS-80 character 0x82.png TRS-80 character 0x83.png TRS-80 character 0x84.png TRS-80 character 0x85.png TRS-80 character 0x86.png TRS-80 character 0x87.png TRS-80 character 0x88.png TRS-80 character 0x89.png TRS-80 character 0x8A.png TRS-80 character 0x8B.png TRS-80 character 0x8C.png TRS-80 character 0x8D.png TRS-80 character 0x8E.png TRS-80 character 0x8F.png
3 TRS-80 character 0x90.png TRS-80 character 0x91.png TRS-80 character 0x92.png TRS-80 character 0x93.png TRS-80 character 0x94.png TRS-80 character 0x95.png TRS-80 character 0x96.png TRS-80 character 0x97.png TRS-80 character 0x98.png TRS-80 character 0x99.png TRS-80 character 0x9A.png TRS-80 character 0x9B.png TRS-80 character 0x9C.png TRS-80 character 0x9D.png TRS-80 character 0x9E.png TRS-80 character 0x9F.png
6 TRS-80 character 0xA0.png TRS-80 character 0xA1.png TRS-80 character 0xA2.png TRS-80 character 0xA3.png TRS-80 character 0xA4.png TRS-80 character 0xA5.png TRS-80 character 0xA6.png TRS-80 character 0xA7.png TRS-80 character 0xA8.png TRS-80 character 0xA9.png TRS-80 character 0xAA.png TRS-80 character 0xAB.png TRS-80 character 0xAC.png TRS-80 character 0xAD.png TRS-80 character 0xAE.png TRS-80 character 0xAF.png
7 TRS-80 character 0xB0.png TRS-80 character 0xB1.png TRS-80 character 0xB2.png TRS-80 character 0xB3.png TRS-80 character 0xB4.png TRS-80 character 0xB5.png TRS-80 character 0xB6.png TRS-80 character 0xB7.png TRS-80 character 0xB8.png TRS-80 character 0xB9.png TRS-80 character 0xBA.png TRS-80 character 0xBB.png TRS-80 character 0xBC.png TRS-80 character 0xBD.png TRS-80 character 0xBE.png TRS-80 character 0xBF.png

The BBC Master and later Acorn computers have the soft font by default defined with line drawing characters.

0123456789ABCDEF
A
B

Amstrad

The Amstrad CPC character set also has soft characters defined by default as block and line drawing characters.

0123456789ABCDEF
8
9

The CP/M Plus character set used on various Amstrad computers of the CPC, PCW and Spectrum families included a rich set of line-drawing characters as well: [8] [9] [10]

[8] 0123456789ABCDEF
8
9

Apple

MouseText is a set of display characters for the Apple IIc, IIe, and IIGS that includes box-drawing characters.

Encoding

On many platforms, the character shape is determined programmatically from the character code.

However, DOS line- and box-drawing characters are not ordered in any programmatic manner, so calculating a particular character shape needs to use a look-up table.

Examples

Sample diagrams made out of the standard box-drawing characters, using a monospaced font:

┌─┬┐  ╔═╦╗  ╓─╥╖  ╒═╤╕ │ ││  ║ ║║  ║ ║║  │ ││ ├─┼┤  ╠═╬╣  ╟─╫╢  ╞═╪╡ └─┴┘  ╚═╩╝  ╙─╨╜  ╘═╧╛ ┌───────────────────┐ │  ╔═══╗ Some Text  │▒ │  ╚═╦═╝ in the box │▒ ╞═╤══╩══╤═══════════╡▒ │ ├──┬──┤           │▒ │ └──┴──┘           │▒ └───────────────────┘▒  ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 

See also

Related Research Articles

Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters.

<span class="mw-page-title-main">Dingbat</span> Typographic symbol class

In typography, a dingbat is an ornament, specifically, a glyph used in typesetting, often employed to create box frames, or as a dinkus. Some of the dingbat symbols have been used as signature marks or used in bookbinding to order sections.

In computer programming, digraphs and trigraphs are sequences of two and three characters, respectively, that appear in source code and, according to a programming language's specification, should be treated as if they were single characters. Trigraphs have been removed from the C++ language, and will be from C as of C23, thus likely aren't used much in practice in C already, nor in any other mainstream language. In the modern world of Unicode/UTF-8 there's no need for trigraphs in language design, which were considered a burden, and neither really digraphs, that likely have very few users, at least in those languages.

<span class="mw-page-title-main">PETSCII</span> Character encoding on Commodore computers

PETSCII, also known as CBM ASCII, is the character set used in Commodore Business Machines' 8-bit home computers.

<span class="mw-page-title-main">ATASCII</span> Character encoding used by the Atari 8-bit home computers

The ATASCII character set, from ATARI Standard Code for Information Interchange, alternatively ATARI ASCII, is a character encoding used in the Atari 8-bit home computers. ATASCII is based on ASCII, but is not fully compatible with it.

<span class="mw-page-title-main">Code page 437</span> Character set of the original IBM PC

Code page 437 is the character set of the original IBM PC. It is also known as CP437, OEM-US, OEM 437, PC-8, or DOS Latin US. The set includes all printable ASCII characters as well as some accented letters (diacritics), Greek letters, icons, and line-drawing symbols. It is sometimes referred to as the "OEM font" or "high ASCII", or as "extended ASCII".

Block Elements is a Unicode block containing square block symbols of various fill and shading. Used along with block elements are box-drawing characters, shade characters, and terminal graphic characters. These can be used for filling regions of the screen and portraying drop shadows. Its block name in Unicode 1.0 was Blocks.

MouseText is a set of 32 graphical characters designed by Bruce Tognazzini and first implemented in the Apple IIc. They were then retrofitted to the Apple IIe forming part of the Enhanced IIe upgrade. A slightly revised version was then released with the Apple IIGS.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

Specials is a short Unicode block of characters allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 code points, five have been assigned since Unicode 3.0:

<span class="mw-page-title-main">Extended ASCII</span> Nickname for 8-bit ASCII-derived character sets

Extended ASCII is a repertoire of character encodings that include the original 96 ASCII character set, plus up to 128 additional characters. There is no formal definition of "extended ASCII", and even use of the term is sometimes criticized, because it can be mistakenly interpreted to mean that the American National Standards Institute (ANSI) had updated its ANSI X3.4-1986 standard to include more characters, or that the term identifies a single unambiguous encoding, neither of which is the case.

In computing HP Roman is a family of character sets consisting of HP Roman Extension, HP Roman-8, HP Roman-9 and several variants. Originally introduced by Hewlett-Packard around 1978, revisions and adaptations were published several times up to 1999. The 1985 revisions were later standardized as IBM codepages 1050 and 1051. Supporting many European languages, the character sets were used by various HP workstations, terminals, calculators as well as many printers, also from third-parties.

<span class="mw-page-title-main">ZX Spectrum character set</span>

The ZX Spectrum character set is the variant of ASCII used in the ZX Spectrum family computers. It is based on ASCII-1967 but the characters ^, ` and DEL are replaced with ↑, £ and ©. It also differs in its use of the C0 control codes other than the common BS and CR, and it makes use of the 128 high-bit characters beyond the ASCII range. The ZX Spectrum's main set of printable characters and system font are also used by the Jupiter Ace computer.

<span class="mw-page-title-main">Semigraphics</span> Method used in early text mode video hardware to emulate raster graphics

Text-based semigraphics, pseudographics, or character graphics is a primitive method used in early text mode video hardware to emulate raster graphics without having to implement the logic for such a display mode.

<span class="mw-page-title-main">Atari ST character set</span> Character set of the Atari ST personal computer family

The Atari ST character set is the character set of the Atari ST personal computer family including the Atari STE, TT and Falcon. It is based on code page 437, the original character set of the IBM PC.

The RPL character set is an 8-bit character set and encoding used by most RPL calculators manufactured by Hewlett-Packard as well as by the HP 82240B thermo printer. It is sometimes referred to simply as "ECMA-94" in documentation, although it is for the most part a superset of ISO/IEC 8859-1 / ECMA-94 in terms of printable characters, and it differs from ISO/IEC 8859-1 by using displayable characters rather than control characters in the 0x80 to 0x9F range of code points.

The Amstrad CP/M Plus character set is any of a group of 8-bit character sets introduced by Amstrad/Locomotive Software for use in conjunction with their adaptation of Digital Research's CP/M Plus on various Amstrad CPC / Schneider CPC and Amstrad PCW / Schneider Joyce machines. The character set was also used on the Amstrad ZX Spectrum +3 version of CP/M.

Sharp MZ character sets are character sets made by Sharp Corporation for Sharp MZ computers. The European and Japanese versions of the software use different character sets.

Symbols for Legacy Computing is a Unicode block containing graphic characters that were used for various home computers from the 1970s and 1980s and in Teletext broadcasting standards. It includes characters from the Amstrad CPC, MSX, Mattel Aquarius, RISC OS, MouseText, Atari ST, TRS-80 Color Computer, Oric, Texas Instruments TI-99/4A, TRS-80, Minitel, Teletext, ATASCII, PETSCII, ZX80, and ZX81 character sets. Semigraphics characters are also included in the form of new block-shaped characters, line-drawing characters, and 60 "sextant" characters.

The Amstrad CPC character set is the character set used in the Amstrad CPC series of 8-bit personal computers when running BASIC. This character set existed in the built-in "lower" ROM chip. It is based on ASCII-1967, with the exception of character 0x5E which is the up arrow instead of the circumflex, as it is in ASCII-1963, a feature shared with other character sets of the time. Apart from the standard printable ASCII range (0x20-0x7e), it is completely different from the Amstrad CP/M Plus character set. The BASIC character set had symbols of particular use in games and home computing, while the CP/M Plus character reflected the International and Business flavor of the CP/M Plus environment. This character set is represented in Unicode as of the March 2020 release of Unicode 13.0, which added symbols for legacy computing. The three missing characters have however been accepted for inclusion in Unicode 16.0 in the symbols for legacy computing supplement.

References

  1. Box Drawing U+2500-U+257F, The Unicode Standard Code Charts
  2. 1 2 Hewlett-Packard - Technical Reference Manual - Portable PLUS (1 ed.). Corvallis, OR, USA: Hewlett-Packard Company, Portable Computer Division. August 1985. 45559-90001. Retrieved 2016-11-27.
  3. 1 2 Hewlett-Packard - Technical Reference Manual - Portable PLUS (PDF) (2 ed.). Portable Computer Division, Corvallis, OR, USA: Hewlett-Packard Company. December 1986 [August 1985]. 45559-90006. Archived (PDF) from the original on 2016-11-28. Retrieved 2016-11-27.
  4. Wiels. "TeleText - Het Protocol" (in Dutch). Mosaic characters. Archived from the original on 2017-12-22. Retrieved 2017-12-21.
  5. "Symbols for Legacy Computing" (PDF). Unicode Consortium. Retrieved 2020-04-19.
  6. Broadcast Teletext Specification, September 1976 (as HTML or scans of original document)
  7. Enhanced Teletext specification (PDF), European Telecommunications Standards Institute, May 1997, p. 126
  8. 1 2 "Appendix II: CP/M Plus character sets / II.1 The complete character set (Language 0)". Spectrum +3 CP/M Plus manual (User Manual). Archived from the original on 2009-10-15. Retrieved 2017-07-10.
  9. Elliott, John C. (2015-04-04). "Amstrad Extended BIOS Internals". Seasip.info. Archived from the original on 2017-07-15. Retrieved 2017-07-15.
  10. "Amstrad CP/M Plus character set". Archived from the original on 2017-07-15. Retrieved 2017-07-15.