Sentence spacing in digital media concerns the horizontal width of the space between sentences in computer- and web-based media. Digital media allow sentence spacing variations not possible with the typewriter. Most digital fonts permit the use of a variable space or a no-break space. [1] Some modern font specifications, such as OpenType, have the ability to automatically add or reduce space after punctuation,[ citation needed ] and users may be able to choose sentence spacing variations.
Modern fonts allow spacing variations that the average user can easily manipulate, such as: non-breaking short spaces (thin spaces), non-breaking normal spaces (thick spaces), breaking normal spaces (thick spaces), and breaking long spaces (em spaces).
The typesetting software TeX treats horizontal runs of whitespace as a single space, but uses a heuristic to recognize sentence endings—typesetting the spaces after them slightly wider than a normal space. This is the default for TeX, although the "\frenchspacing
" TeX macro will disable this feature in favor of using the same amount of space between sentences as it does between words. [2]
Computer word processors will allow the user to input as many spaces as desired. Although the default setting for many applications' grammar-checkers (e.g., Microsoft Word) is single sentence spacing, they can be adjusted to recognize double sentence spacing as correct also. A program called PerfectIt is an "MS Word add-in that helps professionals to proofread faster". The producer states that a feature was added to the most recent version of their program (as of August 2009), "to convert two spaces at the end of a sentence into one", but they have "never had any requests to convert one space into two". [3]
Some plaintext editors, such as Emacs and vi, originally relied on double-spacing to recognize sentence boundaries. By default, Emacs will not break a line at a single space preceded by a period, but this behavior is configurable (with the option sentence-end-double-space
). More than one space will be preserved, but no additional space will be added automatically if it lacks. There are also functions to move the cursor forward or backward to the next double-space in the text. In Vim the joinspaces
setting indicates whether extra spaces are inserted when joining lines together, and the J
flag in cpoptions
indicates whether a sentence must be followed by two spaces. The GNU coding standards recommend using two spaces when coding comments. [4] This also applies to software documentation in the GNU project. The optional Emacs mode LaTeX provides a toggling option French-LaTeX-mode
which, if set to French
, creates single sentence spacing after terminal punctuation.
Web browsers follow the HTML display specification and for programmers' convenience ignore runs of white space when displaying them. [5] This convention originally comes from the underlying SGML standard, which collapses multiple spaces because of the clear division between content and layout information. [6] In order to force a web browser to display multiple spaces, a special character sequence must be used (such as "  
" for an en-space followed by a thin space, " 
" for an em-space, or "
" for two successive full spaces). [7] However, using a non-breaking space can lead to uneven justified text and additional unwanted spaces or line breaks in the text in certain programs. [8] Alternatively, sentence spacing can be controlled in HTML by separating every sentence into a separate element (e.g., a span), and using CSS to finely control sentence spacing. [9] This is seldom done in practice.
To specify and allow multiple spaces to be rendered without collapsing in a web browser, the HTML < pre >
tag or CSS white-space
property can be employed.
ASCII and similar early character encodings provide only a single space, which is breaking and fixed-width (the particular width specified by the respective font). EBCDIC, although earlier than ASCII, provided a breaking fixed-width space (SP), a non-breaking fixed-width space (RSP: "Required SPace"), and an alternate-width non-breaking fixed-width space intended for use in numeric lists with fixed-width (but not necessarily em-width) digits (NSP: "Numeric SPace"). HTML and Unicode can both record runs of consecutive spaces—including multiple-width spaces, and breaking and non-breaking spaces. HTML provides four variations on space width and one fixed-width non-breaking space: <space>
,  
,  
, and  
(all breaking); and
(non-breaking). In a typewriter font, <space>
will equal  
, but will vary according to the font designer's specification in all other fonts, whether proportional or monospace. The HTML standard also specifies display behavior, not just character encoding, so web browsers following the HTML standard will collapse multiple <space>
s to a single <space>
. Non-browser applications that use HTML encoding will not necessarily behave this way at display-time, e.g., later versions of Microsoft Word. Unicode provides 15 variations on space width and breakability, including: THIN SPACE   and NARROW NO-BREAK SPACE  . [10] The following examples demonstrate the effect of these variations on a web browser—using space before punctuation to illustrate identical possible spacing variations following terminal punctuation. These spacing variations, combined with a standard word space, enable users to create custom sentence spacing—as alternatives to a single or double standard word space.
- No space before the exclamation mark!
- A no-break space before the exclamation mark !
- A THIN SPACE (
 
) before the exclamation mark !- A NARROW NO-BREAK SPACE (
 
) before the exclamation mark !- A small-formatted no-break space before the exclamation mark !
Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript, a programming language.
A monospaced font, also called a fixed-pitch, fixed-width, or non-proportional font, is a font whose letters and characters each occupy the same amount of horizontal space. This contrasts with variable-width fonts, where the letters and spacings have different widths.
In writing, a space is a blank area that separates words, sentences, syllables and other written or printed glyphs (characters). Conventions for spacing vary among languages, and in some languages the spacing rules are complex. Inter-word spaces ease the reader's task of identifying words, and avoid outright ambiguities such as "now here" vs. "nowhere". They also provide convenient guides for where a human or program may start new lines.
OpenType is a format for scalable computer fonts. Derived from TrueType, it retains TrueType's basic structure but adds many intricate data structures for describing typographic behavior. OpenType is a registered trademark of Microsoft Corporation.
An HTML element is a type of HTML document component, one of several types of HTML nodes. The first used version of HTML was written by Tim Berners-Lee in 1993 and there have since been many versions of HTML. The current de facto standard is governed by the industry group WHATWG and is known as the HTML Living Standard.
The Standard Compression Scheme for Unicode (SCSU) is a Unicode Technical Standard for reducing the number of bytes needed to represent Unicode text, especially if that text uses mostly characters from one or a small number of per-language character blocks. It does so by dynamically mapping values in the range 128–255 to offsets within particular blocks of 128 characters. The initial conditions of the encoder mean that existing strings in ASCII and ISO-8859-1 that do not contain C0 control codes other than NULL TAB CR and LF can be treated as SCSU strings. Since most alphabets do reside in blocks of contiguous Unicode codepoints, texts that use small alphabets and either ASCII punctuation or punctuation that fits within the window for the main alphabet can be encoded at one byte per character, most other punctuation can be encoded at 2 bytes per symbol through non-locking shifts. SCSU can also switch to UTF-16 internally to handle non-alphabetic languages.
In computing and typesetting, a soft hyphen or syllable hyphen, is a code point reserved in some coded character sets for the purpose of breaking words across lines by inserting visible hyphens if they fall on the line end but remain invisible within the line.
In word processing and digital typesetting, a non-breaking space, also called NBSP, required space, hard space, or fixed space, is a space character that prevents an automatic line break at its position. In some formats, including HTML, it also prevents consecutive whitespace characters from collapsing into a single space. Non-breaking space characters with other widths also exist.
This article provides basic comparisons for notable text editors. More feature details for text editors are available from the Category of text editor features and from the individual products' articles. This article may not be up-to-date or necessarily all-inclusive.
Unicode has subscripted and superscripted versions of a number of characters including a full set of Arabic numerals. These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.
A whitespace character is a character data element that represents white space when text is rendered for display by a computer.
Sentence spacing concerns how spaces are inserted between sentences in typeset text and is a matter of typographical convention. Since the introduction of movable-type printing in Europe, various sentence spacing conventions have been used in languages with a Latin alphabet. These include a normal word space, a single enlarged space, and two full spaces.
Symbol is one of the four standard fonts available on all PostScript-based printers, starting with Apple's original LaserWriter (1985). It contains a complete unaccented Greek alphabet and a selection of commonly used mathematical symbols. Insofar as it fits into any standard classification, it is a serif font designed in the style of Times New Roman.
The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.
Web typography, like typography generally, is the design of pages – their layout and typeface choices. Unlike traditional print-based typography, pages intended for display on the World Wide Web have additional technical challenges and – given its ability to change the presentation dynamically – additional opportunities. Early web page designs were very simple due to technology limitations; modern designs use Cascading Style Sheets (CSS), JavaScript and other techniques to deliver the typographer's and the client's vision.
The zero-width space (), abbreviated ZWSP, is a non-printing character used in computerized typesetting to indicate where the word boundaries are, without actually displaying a visible space in the rendered text. This enables text-processing systems for scripts that do not use explicit spacing to recognize where word boundaries are for the purpose of handling line breaks appropriately. Zero-width space is unicode character U+200B
, and is located in the unicode General Punctuation block, and can be represented by numeric character references ​
or ​
.
Cascading Style Sheets (CSS) is a style sheet language used for specifying the presentation and styling of a document written in a markup language such as HTML or XML. CSS is a cornerstone technology of the World Wide Web, alongside HTML and JavaScript.
In web development, the CSS box model refers to how HTML elements are modeled in browser engines and how the dimensions of those HTML elements are derived from CSS properties. It is a fundamental concept for the composition of HTML webpages. The guidelines of the box model are described by web standards World Wide Web Consortium (W3C) specifically the CSS Working Group. For much of the late-1990s and early 2000s there had been non-standard compliant implementations of the box model in mainstream browsers. With the advent of CSS2 in 1998, which introduced the box-sizing
property, the problem had mostly been resolved.
General Punctuation is a Unicode block containing punctuation, spacing, and formatting characters for use with all scripts and writing systems. Included are the defined-width spaces, joining formats, directional formats, smart quotes, archaic and novel punctuation such as the interrobang, and invisible mathematical operators.
\nonfrenchspacing
.{{cite book}}
: |work=
ignored (help){{cite web}}
: CS1 maint: numeric names: authors list (link){{cite web}}
: CS1 maint: numeric names: authors list (link)