Transformation of text

Last updated

Transformations of text are strategies to perform geometric transformations on text (reversal, rotations, etc.), particularly in systems that do not natively support transformation, such as HTML, seven-segment displays and plain text.

Contents

Implementation

Many systems, such as HTML, seven-segment displays and plain text, do not support transformation of text. In the case of HTML, this limitation in display may eventually be addressed through standard cascading style sheets (CSS), since proposed specifications for CSS3 include rotation for block elements. [1] In the meantime, several ways of producing the visual effects of text transformations have come into use.

The most common of these transformations are rotation and reflection.

Unicode supports a variety of characters that resemble transformed characters, primarily for various forms of phonetic transcription. Each of these character names indicates what kind of transformation the characters have undergone:

Upside-down text

Strategies can be used to render words upside down in languages such as HTML that do not permit rotation of text; using Unicode characters (especially those in the IPA), a very close approximation of upside-down text (also called flip text) can be achieved. The letters s, x, z, and o are rotationally symmetrical, while pairs such as b/q, d/p, and n/u are rotations of each other. The rest of the letters have been encoded into the Unicode IPA section, generating a complete set of upside-down lowercase letters. With the addition of the Fraser alphabet to the Unicode standard in version 5.2, full (or at least near-full) support for upside-down capital letters is now available. Number support is incomplete; four numbers are universally strobogrammatic (0, 8, and 6/9), and the upside-down versions of numbers 2 and 3 have been provisionally assigned Unicode points for use in dozenal notation; however, other numbers still are not supported. Punctuation (by use of such characters as the interpunct and the inverted question mark and exclamation point) is mostly covered. Several Internet utilities exist for the transformation of regular text to (and sometimes from) upside-down text; each has its own slightly different algorithm for letters not precisely or well covered. A list of converters and algorithms can be found at the list below.

A similar process is USD encoding, which uses characters entirely within the ASCII character set. Because it is almost entirely alphanumeric, it is far more compatible with other programs that do not support Unicode, and more readily typed by hand. However, the text created by using USD encoding is far less legible, and in fact, more closely resembles Leet. Another problem is that because not all letters fit well, the USD algorithms cannot be a complete involution (i.e., completely convertible back and forth) and contain a complete set of letters at the same time. For instance, the Albartus USD algorithm example seen in the "Examples" section below has k, T, t, and R still in their upright positions. Another issue with USD encoding is the use of italic type. The letter "a" will, in most typefaces using italic fonts, render it as a "one-story" Latin alpha, thus causing problems with any word using that letter as a lowercase "e." Oblique type does not have this problem.

Below is a conversion table that can be used to transform lowercase, uppercase numeric and punctuation output. These characters require Unicode version 8.0 minimum (in particular the ᘔ and Ɛ from the duodecimal block).

Lowercase Letters z ʎ x ʍ ʌ n ʇ s ɹ b d o u ɯ ʞ ſ̣ ɥ ɟ ǝ p ɔ q ɐ
007A028E0078028D028C006E02870073027900620064006F0075026FA781029E017F+03231D0902650253025F01DD0070025400710250
Capital Letters Z X 𐤵 Ʌ Ո S Ԁ O N I H Ǝ Ɔ
005A214400581093502450548A7B10053A4E4A7790500004F004EA7FDA780A7B0A4E90049004821412132018EA4F70186A4ED2C6F
Numbers 0 6 8 𝘓 9 ߈
0030003600381D6130039100C07C8218B218A21C2
Punctuation ¿ ¡ , ˙ ' ؛
214B203E00BF00A1201E002C02D90027061B

Sideways text

Sideways text presents a unique problem. Unlike rotating text 180 degrees, the number of sideways characters falls far short of what would be needed for most purposes, and because text is rendered horizontally, it would be very difficult to render beyond one line of vertical text in a well-aligned manner without columns, especially in proportional fonts (furthermore, each character would require a line break after it). The process of using alternate characters for sideways text is further complicated by the fact that most fonts space letters further apart vertically (to accommodate underlining and overlining) than horizontally, and that most fonts are taller than they are wider, making simulated sideways text look significantly more awkward.

Until CSS3 introduced rotation for block elements, [2] there was no direct way to rotate text at any direction other than the manual 180-degree method described above. Internet Explorer offered a proprietary CSS property that rotated text 90 degrees clockwise, which has been revised and incorporated into CSS: <div style="writing-mode:vertical-rl;"> There remain some inconsistencies in how the writing-mode property is implemented; rotation can also cause some issues with a given element's width, height and word wrapping.

The most common way around these problems was to use images of text, which can then be rotated and transformed in an image editor at will, and to represent the text in those images with the alt attribute so that search engines and text-only browsers can read it properly. The use of ANSI art and box-drawing characters to manually draw sideways text has the advantage of being copiable and pastable (whereas images are not in most plain text situations), but generally creates large characters and is not generally readable by search engines. With the broader adoption of CSS3 by all of the major browsers, these methods are now mostly obsolete for Web media.

Reversed text

Though less widespread, text can also be reversed to be a mirror image of itself. Letters A, H, I, M, O/o, T, U, V/v, W/w, X/x, Y, and in some fonts i and l are symmetrical in the y-axis; the pairs of b/d and p/q transform to each other. The letters И, Я, and г from Cyrillic, among other sources, are among the numerous characters that can be used to further generate this effect. Reversed text can use capital letters mixed with lowercase, as opposed to the strict lowercase used by upside-down transformation (upside-down lowercase and capital letters do not generally align as they would upright, though reversed letters do).

X-axis symmetry is visible in the letters B, C, D, E, H, I, K, O, X, and in some fonts a and l, as well as in the pairs of a/g, b/p, d/q, e/G, and f/t. Expanding to Cyrillic and Greek produces more symmetries, such as Λ/V and Γ/L.

The Fixedsys Excelsior typeface includes a complete set of reversed characters like this in its Private Use Area. However, online utilities to create mirrored text are not readily available, and most sites that claim to "mirror text" or "reverse text" in fact only change the order of the letters and do not actually flip the letters themselves.

Dilated text

Through the use of Unicode's small capitals, small-form punctuation, and subscript and superscript phonetic modifiers, text can be created that is smaller than the inline text. This is generally only necessary for applications that only support one-size plain text since HTML and CSS support different text sizes.

Examples

Name


Year

𝄩

𝄩







𝄩

ϖ
𝄩


2018
2019
2020
Example table with sideways
text using Unicode characters
Question: How can you tell an introvert from an extrovert?
Answer: ˙sǝoɥs s,ʎnᵷ ɹǝɥʇo ǝɥʇ ʇɐ sʞooꞁ ʇɹǝʌoɹʇxǝ ǝɥʇ 'sɹoʇɐʌǝlǝ ǝɥʇ uı (Using the Revfad algorithm)
Or: 'saoys s.hn6 R3HTO ayt te skool tJa^oJtxa ayt 'sJote^ala ayt uI (using the Albartus USD algorithm)

Example of reversed text reflected along a y-axis:

Example:...иiɒəɒ иɘqo x иoiƨиɘмib oɟ lɒɟɿoq ɘнɟ ɟʇɘl γbodɘмoƧ (Somebody left the portal to Dimension X open again...)

Poet Darius Bacon has written two examples of palindromic poetry that reads the same upside-down as it does rightside up. [3]

Russian

Question: How do flamingos get their color?
Answer: ¿ɯǝʚǹ и̯oʚɔ ɯoıɐhʎvou oɹниꟺɐvф 𝼐ɐ𝼐

See also

Related Research Articles

Web pages authored using HyperText Markup Language (HTML) may contain multilingual text represented with the Unicode universal character set. Key to the relationship between Unicode and HTML is the relationship between the "document character set", which defines the set of characters that may be present in an HTML document and assigns numbers to them, and the "external character encoding", or "charset", used to encode a given document as a sequence of bytes.

<span class="mw-page-title-main">Shavian alphabet</span> Phonemic alphabet proposed for English spelling

The Shavian alphabet is a constructed alphabet conceived as a way to provide simple, phonemic orthography for the English language to replace the inefficiencies and difficulties of conventional spelling using the Latin alphabet. It was posthumously funded by and named after Irish playwright George Bernard Shaw.

<span class="mw-page-title-main">Mojibake</span> Garbled text as a result of incorrect character encodings

Mojibake is the garbled or gibberish text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system.

<span class="mw-page-title-main">Kerning</span> Process in typography

In typography, kerning is the process of adjusting the spacing between characters in a proportional font, usually to achieve a visually pleasing result. Kerning adjusts the space between individual letterforms while tracking (letter-spacing) adjusts spacing uniformly over a range of characters. In a well-kerned font, the two-dimensional blank spaces between each pair of characters all have a visually similar area. The term "keming" is sometimes used informally to refer to poor kerning.

<span class="mw-page-title-main">Tab key</span> Key on a keyboard for tabulation

The tab keyTab ↹ on a keyboard is used to advance the cursor to the next tab stop.

<span class="mw-page-title-main">Ligature (writing)</span> Glyph combining two or more letterforms

In writing and typography, a ligature occurs where two or more graphemes or letters are joined to form a single glyph. Examples are the characters ⟨æ⟩ and ⟨œ⟩ used in English and French, in which the letters ⟨a⟩ and ⟨e⟩ are joined for the first ligature and the letters ⟨o⟩ and ⟨e⟩ are joined for the second ligature. For stylistic and legibility reasons, ⟨f⟩ and ⟨i⟩ are often merged to create ⟨fi⟩ ; the same is true of ⟨s⟩ and ⟨t⟩ to create ⟨st⟩. The common ampersand, ⟨&⟩, developed from a ligature in which the handwritten Latin letters ⟨e⟩ and ⟨t⟩ were combined.

<span class="mw-page-title-main">ʻOkina</span> Letter of the Latin alphabet

The ʻokina, also called by several other names, is a consonant letter used within the Latin script to mark the phonemic glottal stop in many Polynesian languages. It does not have distinct uppercase and lowercase forms.

<span class="mw-page-title-main">Small caps</span> Lowercase characters that resemble uppercase letters except smaller in height

In typography, small caps are characters typeset with glyphs that resemble uppercase letters (capitals) but reduced in height and weight close to the surrounding lowercase letters or text figures. This is technically not a case-transformation, but a substitution of glyphs, although the effect is often approximated by case-transformation and scaling. Small caps are used in running text as a form of emphasis that is less dominant than all uppercase text, and as a method of emphasis or distinctiveness for text alongside or instead of italics, or when boldface is inappropriate. For example, the text "Text in small caps" appears as Text in small caps in small caps. Small caps can be used to draw attention to the opening phrase or line of a new section of text, or to provide an additional style in a dictionary entry where many parts must be typographically differentiated.

Unicode has subscripted and superscripted versions of a number of characters including a full set of Arabic numerals. These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.

A unicase or unicameral alphabet has just one case for its letters. Arabic, Brahmic scripts like Telugu, Kannada, Malayalam, Tamil, Old Hungarian, Hebrew, Iberian, Georgian, and Hangul are unicase writing systems, while modern Latin, Greek, Cyrillic, and Armenian are bicameral, as they have two cases for each letter, e.g. B and b, Β and β, or Բ and բ. Individual characters can also be called unicameral if they are used as letters with a generally bicameral alphabet but have only one form for both cases; for example, the ʻokina as used in Polynesian languages and the glottal stop as used in Nuu-chah-nulth are unicameral.

<span class="mw-page-title-main">Horizontal and vertical writing in East Asian scripts</span> Writing conventions of eastern Asian countries

Many East Asian scripts can be written horizontally or vertically. Chinese characters, Korean hangul, and Japanese kana may be oriented along either axis, as they consist mainly of disconnected logographic or syllabic units, each occupying a square block of space, thus allowing for flexibility for which direction texts can be written, be it horizontally from left-to-right, horizontally from right-to-left, vertically from top-to-bottom, and even vertically from bottom-to-top.

A whitespace character is a character data element that represents white space when text is rendered for display by a computer.

<span class="mw-page-title-main">Strikethrough</span> Words with a horizontal line through them

Strikethrough is a typographical presentation of words with a horizontal line through their center, resulting in text like Tony. Contrary to censored or sanitized (redacted) texts, the words remain readable. This presentation signifies one of two meanings. In ink-written, typewritten, or other non-erasable text, the words are a mistake and not meant for inclusion. When used on a computer screen, however, it indicates deleted information, as popularized by Microsoft Word's revision and track changes features.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

A numeral is a character that denotes a number. The decimal number digits 0–9 are used widely in various writing systems throughout the world, however the graphemes representing the decimal digits differ widely. Therefore Unicode includes 22 different sets of graphemes for the decimal digits, and also various decimal points, thousands separators, negative signs, etc. Unicode also includes several non-decimal numerals such as Aegean numerals, Roman numerals, counting rod numerals, Mayan numerals, Cuneiform numerals and ancient Greek numerals. There is also a large number of typographical variations of the Western Arabic numerals provided for specialized mathematical use and for compatibility with earlier character sets, such as ² or ②, and composite characters such as ½.

<span class="mw-page-title-main">Column (typography)</span>

In typography, a column is one or more vertical blocks of content positioned on a page, separated by gutters or rules. Columns are most commonly used to break up large bodies of text that cannot fit in a single block of text on a page. Additionally, columns are used to improve page composition and readability. Newspapers very frequently use complex multi-column layouts to break up different stories and longer bodies of texts within a story. Column can also more generally refer to the vertical delineations created by a typographic grid system which type and image may be positioned. In page layout, the whitespace on the outside of the page are known as margins; the gap between two facing pages is also considered a gutter, since there are columns on both sides.

<span class="mw-page-title-main">Subscript and superscript</span> A character set slightly below and above the normal line of type, respectively

A subscript or superscript is a character that is set slightly below or above the normal line of type, respectively. It is usually smaller than the rest of the text. Subscripts appear at or below the baseline, while superscripts are above. Subscripts and superscripts are perhaps most often used in formulas, mathematical expressions, and specifications of chemical compounds and isotopes, but have many other uses as well.

<span class="mw-page-title-main">Web typography</span> Publishing considerations for the Web

Web typography, like typography generally, is the design of pages – their layout and typeface choices. Unlike traditional print-based typography, pages intended for display on the World Wide Web have additional technical challenges and – given its ability to change the presentation dynamically – additional opportunities. Early web page designs were very simple due to technology limitations; modern designs use Cascading Style Sheets (CSS), JavaScript and other techniques to deliver the typographer's and the client's vision.

<span class="mw-page-title-main">Rotated letter</span> Printsetting and typographical technique

In the days of printing with metal type sorts, it was common to rotate letters and digits 180° to create new symbols. This was a cheap way to extend the alphabet that didn't require purchasing or cutting custom sorts. The method was used for example with the Palaeotype alphabet, the International Phonetic Alphabet, the Fraser script, and for some mathematical symbols. Perhaps the earliest instance of this that is still in use is turned e for schwa.

References

  1. Bert Bos, ed. (August 9, 2007). "CSS basic box model". W3C . Retrieved 2012-11-19.
  2. Can I use... CSS3 transforms
  3. Bacon, Darius. a poem and deus am. The Palindromist #4.