Windows Glyph List 4

Last updated

Windows Glyph List 4, or more commonly WGL4 for short, also known as the Pan-European character set, is a character repertoire on Microsoft operating systems comprising 657 Unicode characters, two of them private use. Its purpose is to provide an implementation guideline for producers of fonts for the representation of European natural languages; fonts that provide glyphs for the entire set of characters can claim WGL4 compliance and thus can expect to be compatible with a wide range of software.

Contents

As of 2004, WGL4 characters were the only ones guaranteed to display correctly on Microsoft Windows. More recent versions of Windows display far more glyphs.

Because many fonts are designed to fulfill the WGL4 set, this set of characters is likely to work (display as other than replacement glyphs) on many computer systems. For example, all the non-private-use characters in the table below are likely to display properly, compared to the many missing characters that may be seen in other articles about Unicode.

Repertoire

The repertoire, defined by Microsoft, encompasses all the characters found in Windows code pages 1252 (Windows Western), 1250 (Windows Central European), 1251 (Windows Cyrillic), 1253 (Windows Greek), 1254 (Windows Turkish), and 1257 (Windows Baltic), as well as characters from DOS code page 437.

It does not cover the combining diacritics used by Vietnamese-related code page 1258, the Thai letters used in code page 874, Hebrew and Arabic letters covered by code pages 1255 and 1256, or the ideographic characters used by code pages 932, 936, 949 and 950.

It also does not cover the Romanian letters Ș, ș, Ț, and ț (U+0218–B), which were added to several of Microsoft's fonts for Windows Vista (long after the WGL4 repertoire was originally defined).

In version 1.5 of the OpenType Specification (May 2008) four Cyrillic characters were added to the WGL4 character set: Ѐ (U+0400), Ѝ (U+040D), ѐ (U+0450) and ѝ (U+045D). [1] [2] [3]

Character table

U+0123456789ABCDEFBlock
0020!"#$%&'()*+,-./ C0 Controls and Basic Latin
(identical to ASCII printable characters)
00300123456789:;<=>?
0040@ABCDEFGHIJKLMNO
0050PQRSTUVWXYZ[\]^_
0060`abcdefghijklmno
0070pqrstuvwxyz{|}~ 
C1 Controls and Latin-1 Supplement
(identical to ISO/IEC 8859-1)
00A0 ¡¢£¤¥¦§¨©ª«¬-®¯
00B0°±²³´µ·¸¹º»¼½¾¿
00C0ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ
00D0ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß
00E0àáâãäåæçèéêëìíîï
00F0ðñòóôõö÷øùúûüýþÿ
0100ĀāĂ㥹ĆćĈĉĊċČčĎď Latin Extended-A
0110ĐđĒēĔĕĖėĘęĚěĜĝĞğ
0120ĠġĢģĤĥĦħĨĩĪīĬĭĮį
0130İıIJijĴĵĶķĸĹĺĻļĽľĿ
0140ŀŁłŃńŅņŇňʼnŊŋŌōŎŏ
0150ŐőŒœŔŕŖŗŘřŚśŜŝŞş
0160ŠšŢţŤťŦŧŨũŪūŬŭŮů
0170ŰűŲųŴŵŶŷŸŹźŻżŽžſ
Latin Extended-B
0190 ƒ 
01F0 ǺǻǼǽǾǿ
02C0 ˆˇ ˉ  Spacing Modifier Letters
02D0 ˘˙˚˛˜˝ 
0380 ΄΅Ά·ΈΉΊ Ό ΎΏ Greek
0390ΐΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟ
03A0ΠΡ ΣΤΥΦΧΨΩΪΫάέήί
03B0ΰαβγδεζηθικλμνξο
03C0πρςστυφχψωϊϋόύώ 
0400ЀЁЂЃЄЅІЇЈЉЊЋЌЍЎЏ Cyrillic
0410АБВГДЕЖЗИЙКЛМНОП
0420РСТУФХЦЧШЩЪЫЬЭЮЯ
0430абвгдежзийклмноп
0440рстуфхцчшщъыьэюя
0450ѐёђѓєѕіїјљњћќѝўџ
0490Ґґ 
1E80  Latin Extended Additional
1EF0  
2010    General Punctuation
2020  
2030     
2040  
2070  Super/Subscripts
20A0     Currency Symbols
2100   Letterlike symbols
2110   
2120  Ω  
2150   Number Forms
2190  Arrows
21A0  
2200    Mathematical Operators
2210    
2220   
2240  
2260  
2300   Miscellaneous Technical
2310 
2320 
2500    Box-drawing characters
2510    
2520   
2530   
2550
2560 
2580     Block Elements
2590 
25A0   Geometric Shapes
25B0    
25C0   
25D0  
25E0  
Miscellaneous Symbols
2630  
2640  
2660    
F000   Private Use Area
FB00   Alphabetic Presentation Forms
U+0123456789ABCDEFBlock
Legend
  not used
  optional
  private use (F001 and F002 are supposed to be duplicates of fi and fl)
  added in OpenType Specification v1.5

See also

Related Research Articles

<span class="mw-page-title-main">Character encoding</span> Using numbers to represent text characters

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that make up a character encoding are known as "code points" and collectively comprise a "code space", a "code page", or a "character map".

<span class="mw-page-title-main">Cyrillic script</span> Writing system used for various Eurasian languages

The Cyrillic script, Slavonic script or simply Slavic script is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking countries in Southeastern Europe, Eastern Europe, the Caucasus, Central Asia, North Asia, and East Asia, and used by many other minority languages.

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text written in all of the world's major writing systems. Version 15.1 of the standard defines 149813 characters and 161 scripts used in various ordinary, literary, academic, and technical contexts.

OpenType is a format for scalable computer fonts. Derived from TrueType, it retains TrueType's basic structure but adds many intricate data structures for describing typographic behavior. OpenType is a registered trademark of Microsoft Corporation.

<span class="mw-page-title-main">Dingbat</span> Typographic symbol class

In typography, a dingbat is an ornament, specifically, a glyph used in typesetting, often employed to create box frames, or as a dinkus. Some of the dingbat symbols have been used as signature marks or used in bookbinding to order sections.

Arial Unicode MS is a TrueType font and the extended version of the font Arial. Compared to Arial, it includes higher line height, omits kerning pairs and adds enough glyphs to cover a large subset of Unicode 2.1—thus supporting most Microsoft code pages, but also requiring much more storage space. It also adds Ideographic layout tables, but unlike Arial, it mandates no smoothing in the 14–18 point range, and contains Roman (upright) glyphs only; there is no oblique (italic) version. Arial Unicode MS was previously distributed with Microsoft Office, but this ended in 2016 version. It is bundled with Mac OS X v10.5 and later. It may also be purchased separately from Ascender Corporation, who licenses the font from Microsoft.

<span class="mw-page-title-main">Code page 437</span> Character set of the original IBM PC

Code page 437 is the character set of the original IBM PC. It is also known as CP437, OEM-US, OEM 437, PC-8, or DOS Latin US. The set includes all printable ASCII characters as well as some accented letters (diacritics), Greek letters, icons, and line-drawing symbols. It is sometimes referred to as the "OEM font" or "high ASCII", or as "extended ASCII".

Uniscribe is the Microsoft Windows set of services for rendering Unicode-encoded text, supporting complex text layout. It is implemented in the dynamic link library USP10.DLL. Uniscribe was released with Windows 2000 and Internet Explorer 5.0. In addition, the Windows CE platform has supported Uniscribe since version 5.0.

The internationalized domain name (IDN) homograph attack is a way a malicious party may deceive computer users about what remote system they are communicating with, by exploiting the fact that many different characters look alike. For example, the Cyrillic, Greek and Latin alphabets each have a letter ⟨o⟩ that has the same shape but different meaning from its counterparts.

<span class="mw-page-title-main">Fixed (typeface)</span> Monospace sans-serif typeface

misc-fixed is a collection of monospace bitmap fonts that is distributed with the X Window System. It is a set of independent bitmap fonts which—apart from all being sans-serif fonts—cannot be described as belonging to a single font family. The misc-fixed fonts were the first fonts available for the X Window System. Their individual origin is not attributed, but it is likely that many of them were created in the early or mid 1980s as part of MIT's Project Athena, or at its industrial partner, DEC. The misc-fixed fonts are in the public domain.

<span class="mw-page-title-main">Andalé Mono</span> Monospaced typeface

Andalé Mono is a monospaced sans-serif typeface designed by Steve Matteson for terminal emulation and software development environments, originally for the Taligent project by Apple Inc. and IBM. Andalé Mono has a sibling called Andalé Sans.

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane, and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

A Unicode font is a computer font that maps glyphs to code points defined in the Unicode Standard. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single writing system, or even only support the basic Latin alphabet. Fonts which support a wide range of Unicode scripts and Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a TrueType font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters. This article lists some widely used Unicode fonts that support a comparatively large number and broad range of Unicode characters.

<span class="mw-page-title-main">Microsoft Sans Serif</span> Neo-grotesque sans-serif typeface

Microsoft Sans Serif is a sans-serif typeface introduced with early Microsoft Windows versions. It is the successor of MS Sans Serif, formerly Helv, a proportional bitmap font introduced in Windows 1.0. Both typefaces are very similar in design to Arial and Helvetica. The typeface was designed to match the MS Sans bitmap included in the early releases of Microsoft Windows.

<span class="mw-page-title-main">Sylfaen (typeface)</span> Serif typeface

Sylfaen is a multi-script serif font family designed by John Hudson and W. Ross Mills of Tiro Typeworks, and Geraldine Wade of Monotype Typography. The name Sylfaen is a Welsh word meaning foundation.

<span class="mw-page-title-main">Unicode input</span> Input characters using their Unicode code points

Unicode input is the insertion of a specific Unicode character on a computer by a user; it is a common way to input characters not directly supported by a physical keyboard. Unicode characters can be produced either by selecting them from a display or by typing a certain sequence of keys on a physical keyboard. In addition, a character produced by one of these methods in one web page or document can be copied into another. In contrast to ASCII's 96 element character set, Unicode encodes hundreds of thousands of graphemes (characters) from almost all of the world's written languages and many other signs and symbols besides.

<span class="mw-page-title-main">Web typography</span> Publishing considerations for the Web

Web typography, like typography generally, is the design of pages – their layout and typeface choices. Unlike traditional print-based typography, pages intended for display on the World Wide Web have additional technical challenges and – given its ability to change the presentation dynamically – additional opportunities. Early web page designs were very simple due to technology limitations; modern designs use Cascading Style Sheets (CSS), JavaScript and other techniques to deliver the typographer's and the client's vision.

<span class="mw-page-title-main">OCR-B</span> Sans-serif typeface

OCR-B is a monospace font developed in 1968 by Adrian Frutiger for Monotype by following the European Computer Manufacturer's Association standard. Its function was to facilitate the optical character recognition operations by specific electronic devices, originally for financial and bank-oriented uses. It was accepted as the world standard in 1973. It follows the ISO 1073-2:1976 (E) standard, refined in 1979. It includes all ASCII symbols, and other symbols needed in the bank environment. It is widely used for the human readable digits in UPC/EAN barcodes. It is also used for machine-readable passports. It shares that purpose with OCR-A, but it is easier for the human eye and brain to read and it has a less technical look than OCR-A.

The world glyph sets are character repertoires comprising a subset of Unicode characters. Their purpose is to provide an implementation guideline for producers of fonts for the representation of natural languages. Unlike Windows Glyph List 4 (WGL) it is specified by font foundries and not by operating system manufacturers. It is, however, very similar in glyph coverage to WGL4, but neither contains all the characters of the other.

References

  1. "OpenType Specification Change Log § Version 1.5". Microsoft Typography. Microsoft Learn . Retrieved 2024-04-13. wgl4d.htm: Added four Macedonian characters
  2. "WGL4 character set U+017F to U+1EF3 (OpenType 1.4)". Microsoft Typography. Microsoft Learn . Retrieved 2024-04-13.
  3. "WGL4 character set U+017F to U+1EF3 (OpenType 1.5)". Microsoft Typography. Microsoft Learn . Retrieved 2024-04-13.