Complex text layout

Last updated
The Devanagari ddhrya-ligature, as displayed in the JanaSanskritSans font, which should be invoked by the layout engine to render the sequence d +  + dh +  + r +  + y = ddhry. JanaSanskritSans ddhrya.svg
The Devanagari ddhrya-ligature, as displayed in the JanaSanskritSans font, which should be invoked by the layout engine to render the sequence द + ् + ध + ् + र + ् + य = द्ध्र्य.
The word l`rby@
al-arabiyyah, "the Arabic [language]" in Arabic, in successive stages of rendering. The first line shows the letters in left-to-right order and unjoined, as they might appear in an application without complex text layout. In the second line, bidirectional display has been applied, and in the third the glyph-shaping mechanism has rendered the letters according to context. Arabicrender.png
The word العربيةal-arabiyyah, "the Arabic [language]" in Arabic, in successive stages of rendering. The first line shows the letters in left-to-right order and unjoined, as they might appear in an application without complex text layout. In the second line, bidirectional display has been applied, and in the third the glyph-shaping mechanism has rendered the letters according to context.

Complex text layout (CTL) or complex text rendering is the typesetting of writing systems in which the shape or positioning of a grapheme depends on its relation to other graphemes. The term is used in the field of software internationalization, where each grapheme is a character.

Contents

Scripts which require CTL for proper display may be known as complex scripts. Examples include the Arabic alphabet and scripts of the Brahmic family, such as Devanagari, Khmer script or the Thai alphabet. Many scripts do not require CTL. For instance, the Latin alphabet or Chinese characters can be typeset by simply displaying each character one after another in straight rows or columns. However, even these scripts have alternate forms or optional features (such as cursive writing) which require CTL to produce on computers.

Characteristics requiring CTL

The main characteristics of CTL complexity are:

Not all occurrences of these characteristics require CTL. For example, the Greek alphabet has context-sensitive shaping of the letter sigma, which appears as ς at the end of a word and σ elsewhere. However, these two forms are normally stored as different characters; for instance, Unicode has both U+03C2ςGREEK SMALL LETTER FINAL SIGMA and U+03C3σGREEK SMALL LETTER SIGMA, and does not treat them as equivalent. For collation and comparison purposes, software should consider the string "δῖος Ἀχιλλεύς" equivalent to "δῖοσ Ἀχιλλεύσ", [1] but for typesetting purposes they are distinct and CTL is not required to choose the correct form.

Implementations

Most text-rendering software that is capable of CTL will include information about specific scripts, and so will be able to render them correctly without font files needing to supply instructions on how to lay out characters. Such software is usually provided in a library; examples include:

However, such software is unable to properly render any script for which it lacks instructions, which can include many minority scripts. The alternative approach is to include the rendering instructions in the font file itself. Rendering software still needs to be capable of reading and following the instructions, but this is relatively simple.

Examples of this latter approach include Apple Advanced Typography (AAT) and Graphite. Both of these names encompass both the instruction format and the software supporting it; AAT is included on Apple operating systems, while Graphite is available for Microsoft Windows and Linux-based systems.

The OpenType format is primarily intended for systems using the first approach (layout knowledge in the renderer, not the font), but it has a few features that assist with CTL, such as contextual ligatures. AAT and Graphite instructions can be embedded in OpenType font files.

See also

Related Research Articles

<span class="mw-page-title-main">Glyph</span> Element of writing

A glyph is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A grapheme, or part of a grapheme, or sometimes several graphemes in combination can be represented by a glyph.

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic scripts, as well as symbols, thousands of emoji, and non-visual control and formatting codes.

<span class="mw-page-title-main">Sinhala script</span> Abugida writing system

The Sinhala script, also known as Sinhalese script, is a writing system used by the Sinhalese people and most Sri Lankans in Sri Lanka and elsewhere to write the Sinhala language as well as the liturgical languages Pali and Sanskrit. The Sinhalese Akṣara Mālāva, one of the Brahmic scripts, is a descendant of the Ancient Indian Brahmi script. It is also related to the Grantha script.

OpenType is a format for scalable computer fonts. Derived from TrueType, it retains TrueType's basic structure but adds many intricate data structures for describing typographic behavior. OpenType is a registered trademark of Microsoft Corporation.

<span class="mw-page-title-main">Pango</span> Library for text rendering

Pango is a text layout engine library which works with the HarfBuzz shaping engine for displaying multi-language text.

<span class="mw-page-title-main">Scribus</span> Desktop publishing application

Scribus is free and open-source desktop publishing (DTP) software available for most desktop operating systems. It is designed for layout, typesetting, and preparation of files for professional-quality image-setting equipment. Scribus can also create animated and interactive PDF presentations and forms. Example uses include writing newspapers, brochures, newsletters, posters, and books.

<span class="mw-page-title-main">Ligature (writing)</span> Glyph combining two or more letterforms in a single typeset or handwritten character

In writing and typography, a ligature occurs where two or more graphemes or letters are joined to form a single glyph. Examples are the characters æ and œ used in English and French, in which the letters 'a' and 'e' are joined for the first ligature and the letters 'o' and 'e' are joined for the second ligature. For stylistic and legibility reasons, 'f' and 'i' are often merged to create 'fi' ; the same is true of 's' and 't' to create 'st'. The common ampersand (&) developed from a ligature in which the handwritten Latin letters 'E' and 't' were combined.

The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BC. It is derived from the earlier Phoenician alphabet, and was the earliest known alphabetic script to have distinct letters for vowels as well as consonants. In Archaic and early Classical times, the Greek alphabet existed in many local variants, but, by the end of the 4th century BC, the Euclidean alphabet, with 24 letters, ordered from alpha to omega, had become standard and it is this version that is still used for Greek writing today.

Uniscribe is the Microsoft Windows set of services for rendering Unicode-encoded text, supporting complex text layout. It is implemented in the dynamic link library USP10.DLL. Uniscribe was released with Windows 2000 and Internet Explorer 5.0. In addition, the Windows CE platform has supported Uniscribe since version 5.0.

<span class="mw-page-title-main">XeTeX</span> TeX typesetting engine

XeTeX is a TeX typesetting engine using Unicode and supporting modern font technologies such as OpenType, Graphite and Apple Advanced Typography (AAT). It was originally written by Jonathan Kew and is distributed under the X11 free software license.

Apple Advanced Typography (AAT) is Apple Inc.'s computer technology for advanced font rendering, supporting internationalization and complex features for typographers, a successor to Apple's little-used QuickDraw GX font technology of the mid-1990s. It is a set of extensions to the TrueType outline font standard, with smartfont features similar to the OpenType font format that was developed by Adobe and Microsoft, and to Graphite. It also incorporates concepts from Adobe's "multiple master" font format, allowing for axes of traits to be defined and morphing of a glyph independently along each of these axes. AAT font features do not alter the underlying typed text; they only affect the characters' representation during glyph conversion.

Graphite is a programmable Unicode-compliant smart font technology and rendering system developed by SIL International as free software, distributed under the terms of the GNU Lesser General Public License and the Common Public License.

Unicode has a certain amount of duplication of characters. These are pairs of single Unicode code points that are canonically equivalent. The reason for this are compatibility issues with legacy systems.

<span class="mw-page-title-main">Zero-width joiner</span> Non-printing character used in computerized typesetting

The zero-width joiner (&#8205;) is a non-printing character used in the computerized typesetting of writing systems in which the shape or positioning of a grapheme depends on its relation to other graphemes, such as the Arabic script or any Indic script. Sometimes the Roman script is to be counted as complex, e.g. when using a Fraktur typeface. When placed between two characters that would otherwise not be connected, a ZWJ causes them to be printed in their connected forms.

A Unicode font is a computer font that maps glyphs to code points defined in the Unicode Standard. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single writing system, or even only support the basic Latin alphabet. Fonts which support a wide range of Unicode scripts and Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a TrueType font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters. This article lists some widely used Unicode fonts that support a comparatively large number and broad range of Unicode characters.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

<span class="mw-page-title-main">Panorama (typesetting software)</span>

Panorama is a line layout and text composition engine to render text in various worldwide languages made by Bitstream Inc. Panorama uses Font Fusion as the base to support rendering of the text. The engine allows the user to manage different text formatting aspects like spacing, alignment, style effects.

HarfBuzz is a software development library for text shaping, which is the process of converting Unicode text to glyph indices and positions. The newer version, New HarfBuzz (2012–), targets various font technologies while the first version, Old HarfBuzz (2006–2012), targeted only OpenType fonts.

<span class="mw-page-title-main">Mon–Burmese script</span> South-East Asian writing system

The Mon–Burmese script (မွန်မြန်မာအက္ခရာ) is an abugida that derives from the Pallava Grantha script of southern India and later of Southeast Asia. It is the basis of the alphabets used for modern Burmese, Mon, Shan, Rakhine, Jingpho and Karen.

Tamil All Character Encoding (TACE16) is a 16-bit Unicode-based character encoding scheme for Tamil language.

References

  1. "FAQ - Greek Language & Script". Unicode Consortium. 2012-12-03. Retrieved 2013-09-13. It is easier to simply equate the two sigma codes for operations which are concerned with word content, for example.