Sentence spacing in digital media

Last updated

Sentence spacing in digital media concerns the horizontal width of the space between sentences in computer- and web-based media. Digital media allow sentence spacing variations not possible with the typewriter. Most digital fonts permit the use of a variable space or a no-break space. [1] Some modern font specifications, such as OpenType, have the ability to automatically add or reduce space after punctuation,[ citation needed ] and users may be able to choose sentence spacing variations.

Contents

Modern fonts allow spacing variations that the average user can easily manipulate, such as: non-breaking short spaces (thin spaces), non-breaking normal spaces (thick spaces), breaking normal spaces (thick spaces), and breaking long spaces (em spaces).

Word processors and text input programs

The typesetting software TeX treats horizontal runs of whitespace as a single space, but uses a heuristic to recognize sentence endings—typesetting the spaces after them slightly wider than a normal space. This is the default for TeX, although the "\frenchspacing" TeX macro will disable this feature in favor of using the same amount of space between sentences as it does between words. [2]

Computer word processors will allow the user to input as many spaces as desired. Although the default setting for many applications' grammar-checkers (e.g., Microsoft Word) is single sentence spacing, they can be adjusted to recognize double sentence spacing as correct also. A program called PerfectIt is an "MS Word add-in that helps professionals to proofread faster". The producer states that a feature was added to the most recent version of their program (as of August 2009), "to convert two spaces at the end of a sentence into one", but they have "never had any requests to convert one space into two". [3]

Some computer text editors, such as Emacs and vi, originally relied on double-spacing to recognize sentence boundaries. By default, Emacs will not break a line at a single space preceded by a period, but this behavior is configurable (with the option sentence-end-double-space). There are also functions to move the cursor forward or backward to the next double-space in the text. In Vim the joinspaces setting indicates whether extra spaces are inserted when joining lines together, and the J flag in cpoptions indicates whether a sentence must be followed by two spaces. The GNU coding standards recommend using two spaces when coding comments. [4] The optional Emacs mode LaTeX provides a toggling option French-LaTeX-mode which, if set to French, creates single sentence spacing after terminal punctuation.

Web browsers

Web browsers follow the HTML display specification and for programmers' convenience ignore runs of white space when displaying them. [5] This convention originally comes from the underlying SGML standard, which collapses multiple spaces because of the clear division between content and layout information. [6] In order to force a web browser to display multiple spaces, a special character sequence must be used (such as "  " for an en-space followed by a thin space, " " for an em-space, or "   " for two successive full spaces). [7] However, using a non-breaking space can lead to uneven justified text and additional unwanted spaces or line breaks in the text in certain programs. [8] Alternatively, sentence spacing can be controlled in HTML by separating every sentence into a separate element (e.g., a span), and using CSS to finely control sentence spacing. [9] This is seldom done in practice.

To specify and allow multiple spaces to be rendered without collapsing in a web browser, the HTML < pre > tag or CSS white-space property can be employed.

Character encodings

ASCII and similar early character encodings provide only a single space, which is breaking and fixed-width (the particular width specified by the respective font). EBCDIC, although earlier than ASCII, provided a breaking fixed-width space (SP), a non-breaking fixed-width space (RSP: "Required SPace"), and an alternate-width non-breaking fixed-width space intended for use in numeric lists with fixed-width (but not necessarily em-width) digits (NSP: "Numeric SPace"). HTML and Unicode can both record runs of consecutive spaces—including multiple-width spaces, and breaking and non-breaking spaces. HTML provides four variations on space width and one fixed-width non-breaking space: <space>, &emsp;, &ensp;, and &thinsp; (all breaking); and &nbsp; (non-breaking). In a typewriter font, <space> will equal &emsp;, but will vary according to the font designer's specification in all other fonts, whether proportional or monospace. The HTML standard also specifies display behavior, not just character encoding, so web browsers following the HTML standard will collapse multiple <space>s to a single <space>. Non-browser applications that use HTML encoding will not necessarily behave this way at display-time, e.g., later versions of Microsoft Word. Unicode provides 15 variations on space width and breakability, including: THIN SPACE &#8201; and NARROW NO-BREAK SPACE &#8239. [10] The following examples demonstrate the effect of these variations on a web browser—using space before punctuation to illustrate identical possible spacing variations following terminal punctuation. These spacing variations, combined with a standard word space, enable users to create custom sentence spacing—as alternatives to a single or double standard word space.

See also

Related Research Articles

Punctuation is the use of spacing, conventional signs, and certain typographical devices as aids to the understanding and correct reading of written text, whether read silently or aloud. Another description is, "It is the practice, action, or system of inserting points or other small marks into texts in order to aid interpretation; division of text into sentences, clauses, etc., by means of such marks."

<span class="mw-page-title-main">Monospaced font</span> Font whose characters occupy the same amount of horizontal space

A monospaced font, also called a fixed-pitch, fixed-width, or non-proportional font, is a font whose letters and characters each occupy the same amount of horizontal space. This contrasts with variable-width fonts, where the letters and spacings have different widths.

The question mark? is a punctuation mark that indicates an interrogative clause or phrase in many languages.

In writing, a space is a blank area that separates words, sentences, syllables and other written or printed glyphs (characters). Conventions for spacing vary among languages, and in some languages the spacing rules are complex. Inter-word spaces ease the reader's task of identifying words, and avoid outright ambiguities such as "now here" vs. "nowhere". They also provide convenient guides for where a human or program may start new lines.

The interrobang, also known as the interabang, is an unconventional punctuation mark used in various written languages and intended to combine the functions of the question mark, or interrogative point, and the exclamation mark, or exclamation point, known in the jargon of printers and programmers as a "bang". The glyph is a ligature of these two marks and was first proposed in 1962 by Martin K. Speckter.

<span class="mw-page-title-main">Soft hyphen</span> Unicode character

In computing and typesetting, a soft hyphen or syllable hyphen, abbreviated SHY, is a code point reserved in some coded character sets for the purpose of breaking words across lines by inserting visible hyphens. Two alternative ways of using the soft hyphen character for this purpose have emerged, depending on whether the encoded text will be broken into lines by its recipient, or has already been preformatted by its originator.

<span class="mw-page-title-main">ArmSCII</span> Set of obsolete single-byte character encodings

ArmSCII or ARMSCII is a set of obsolete single-byte character encodings for the Armenian alphabet defined by Armenian national standard 166–9. ArmSCII is an acronym for Armenian Standard Code for Information Interchange, similar to ASCII for the American standard. It has been superseded by the Unicode standard.

In word processing and digital typesetting, a non-breaking space,  , also called NBSP, required space, hard space, or fixed space, is a space character that prevents an automatic line break at its position. In some formats, including HTML, it also prevents consecutive whitespace characters from collapsing into a single space.

Line breaking, also known as word wrapping, is breaking a section of text into lines so that it will fit into the available width of a page, window or other display area. In text display, line wrap is continuing on a new line when a line is full, so that each line fits into the viewable window, allowing text to be read from top to bottom without any horizontal scrolling. Word wrap is the additional feature of most text editors, word processors, and web browsers, of breaking lines between words rather than within words, where possible. Word wrap makes it unnecessary to hard-code newline delimiters within paragraphs, and allows the display of text to adapt flexibly and dynamically to displays of varying sizes.

In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area on a page. For example, the common whitespace symbol U+0020 SPACE represents a blank space punctuation character in text, used as a word divider in Western scripts.

Sentence spacing concerns how spaces are inserted between sentences in typeset text and is a matter of typographical convention. Since the introduction of movable-type printing in Europe, various sentence spacing conventions have been used in languages with a Latin alphabet. These include a normal word space, a single enlarged space, and two full spaces.

Symbol is one of the four standard fonts available on all PostScript-based printers, starting with Apple's original LaserWriter (1985). It contains a complete unaccented Greek alphabet and a selection of commonly used mathematical symbols. Insofar as it fits into any standard classification, it is a serif font designed in the style of Times New Roman.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

The zero-width space (), abbreviated ZWSP, is a non-printing character used in computerized typesetting to indicate word boundaries to text-processing systems in scripts that do not use explicit spacing, or after characters that are not followed by a visible space but after which there may nevertheless be a line break. It is also used with languages without visible space between words, for example, Japanese. Normally, it is not a visible separation, but it may expand in passages that are fully justified.

<span class="mw-page-title-main">History of sentence spacing</span> Evolution of sentence spacing conventions from the introduction of movable type in Europe

The history of sentence spacing is the evolution of sentence spacing conventions from the introduction of movable type in Europe by Johannes Gutenberg to the present day.

Sentence spacing guidance is provided in many language and style guides. The majority of style guides that use a Latin-derived alphabet as a language base now prescribe or recommend the use of a single space after the concluding punctuation of a sentence.

The Unicode Standard assigns various properties to each Unicode character and code point.

A figure space or numeric space is a typographic unit equal to the size of a single numerical digit. Its size can fluctuate somewhat depending on which font is being used. This is the preferred space to use in numbers. It has the same width as a digit and keeps the number together for the purpose of line breaking.

In typography, a thin space is a space character whose width is usually 15 or 16 of an em. It is used to add a narrow space, such as between nested quotation marks or to separate glyphs that interfere with one another. It is not as narrow as the hair space. It is also used in the International System of Units and in many countries as a thousands separator when writing numbers in groups of three digits, in order to facilitate reading.

General Punctuation is a Unicode block containing punctuation, spacing, and formatting characters for use with all scripts and writing systems. Included are the defined-width spaces, joining formats, directional formats, smart quotes, archaic and novel punctuation such as the interrobang, and invisible mathematical operators.

References

Citations

  1. Microsoft 2010.
  2. Eijkhout 2008. pp. 185–188. The default TeX spacing can be explicitly enabled with \nonfrenchspacing.
  3. Lloyd and Hallahan 2009. See "iEditor" entry: 11 August 2009.
  4. Free Software Foundation 2010. Main work: GNU Coding Standards
  5. WC3 1999 9.1 White Space; Lupton 2004. p. 165.
  6. How many spaces at the end of a sentence? One or two?
  7. WC3 1999. 24.4. Character Entity References for Markup-Significant and Internationalization characters".
  8. University of Chicago Press Chicago Manual of Style Online 2007.
  9. How many spaces at the end of a sentence? One or two?
  10. Korpela 2005; Unicode 2009; Sheerin 2001.

Sources

  • Eijkhout, Victor (2008). TeX by Topic, A TeXnician's Reference (PDF). Lulu. pp. 185–188. ISBN   978-0-201-56882-0 . Retrieved 15 March 2010.
  • Free Software Foundation (12 April 2010). "5.2 Commenting Your Work". GNU Coding Standards. Free Software Foundation. Retrieved 17 May 2010.
  • Korpela, Jukka (3 May 2005). "Guide to the Unicode Standard". Characters and Encodings. IT and Communication. Retrieved 17 May 2010.
  • Lloyd, John Wills; Hallahan, Dan (10 November 2009). "Where's the Evidence to Justify Two Spaces?". Spacewaste. Wordpress.com. Retrieved 4 April 2010.
  • Lupton, Ellen (2004). Thinking with Type . New York: Princeton Architectural Press. ISBN   978-1-56898-448-3.
  • Microsoft. "Character design standards (5 of 10): Space Characters for Latin 1". Microsoft Typography. Microsoft. Retrieved 16 May 2010.
  • Sheerin, Peter K. (19 October 2001). "The Trouble With EM 'n EN (and Other Shady Characters)". A List Apart. A List Apart Magazine. Retrieved 17 May 2010.
  • University of Chicago Press (2007). One Space or Two?. Chicago Manual of Style Online. University of Chicago Press. p. 984. Retrieved 8 February 2010.
  • Unicode (2009). "Unicode Standard Annex #14: Unicode Line Breaking Algorithm". Unicode Technical Reports. Unicode. Retrieved 17 May 2010.
  • W3C (24 December 1999). "9.1 White Space". HTML 4.01 Coding Specification. W3C. Retrieved 17 May 2010.
  • W3C (24 December 1999). "24.4 Character Entity References for Markup-Significant and Internationalization characters". HTML 4.01 Coding Specification. W3C. Retrieved 17 May 2010.