The zero-width space(), abbreviated ZWSP, is a non-printing character used in computerized typesetting to indicate where the word boundaries are, without actually displaying a visible space in the rendered text. This enables text-processing systems for scripts that do not use explicit spacing to recognize where word boundaries are for the purpose of handling line breaks appropriately.
The zero-width space is Unicode character U+200B
, and is located in the Unicode General Punctuation block. In HTML, it can be represented by the character entity reference ​
.
The zero-width space marks a potential line break without hyphenation. Its semantics and HTML implementation are similar to the soft hyphen, but soft hyphens display a hyphen character at the point where the line is broken.
The zero-width space can be used to mark word breaks in languages without visible space between words, such as Thai, Myanmar, Khmer, and Japanese. [1]
In justified text, the rendering engine may add inter-character spacing, also known as letter spacing, between letters separated by a zero-width space, unlike around fixed-width spaces. [1]
To show the effect of the zero-width space in text, the following words have been separated with zero-width spaces:
LoremIpsumDolorSitAmetConsecteturAdipiscingElitSedDoEiusmodTemporIncididuntUtLaboreEtDoloreMagnaAliquaUtEnimAdMinimVeniamQuisNostrudExercitationUllamcoLaborisNisiUtAliquipExEaCommodoConsequatDuisAuteIrureDolorInReprehenderitInVoluptateVelitEsseCillumDoloreEuFugiatNullaPariaturExcepteurSintOccaecatCupidatatNonProidentSuntInCulpaQuiOfficiaDeseruntMollitAnimIdEstLaborum
By contrast, the following words have not been separated:
LoremIpsumDolorSitAmetConsecteturAdipiscingElitSedDoEiusmodTemporIncididuntUtLaboreEtDoloreMagnaAliquaUtEnimAdMinimVeniamQuisNostrudExercitationUllamcoLaborisNisiUtAliquipExEaCommodoConsequatDuisAuteIrureDolorInReprehenderitInVoluptateVelitEsseCillumDoloreEuFugiatNullaPariaturExcepteurSintOccaecatCupidatatNonProidentSuntInCulpaQuiOfficiaDeseruntMollitAnimIdEstLaborum
The first text is broken into lines but only at word boundaries, and resizing the browser window will re-break the text accordingly, while the second text is not broken at all.
In HTML pages, the HTML element <wbr>
functions as a zero-width space. In Internet Explorer 6, the zero-width space was not supported in some fonts. [2]
ICANN rules prohibit domain names from containing non-displayed characters, including the zero-width space, and most browsers prohibit their use within domain names because they can be used to create a homograph attack, where a malicious URL is visually indistinguishable from a legitimate one. [3] [4]
The zero-width space character is encoded in Unicode as U+200BZERO WIDTH SPACE. [5]
In HTML, it can be referenced as ​
, ​
or ​
. Additionally, the character entities ​
, ​
, ​
, and ​
all also refer to the zero-width space, contrary to what their names suggest. [6]
The TeX representation is \hskip0pt
; the LaTeX representation is \hspace{0pt}
; [7] and the groff representation is \:
. [8]
The hyphen‐ is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation.
In punctuation, a word divider is a form of glyph which separates written words. In languages which use the Latin, Cyrillic, and Arabic alphabets, as well as other scripts of Europe and West Asia, the word divider is a blank space, or whitespace. This convention is spreading, along with other aspects of European punctuation, to Asia and Africa, where words are usually written without word separation.
Lorem ipsum is a dummy or placeholder text commonly used in graphic design, publishing, and web development to fill empty spaces in a layout that does not yet have content.
The zero-width non-joiner is a non-printing character used in the computerization of writing systems that make use of ligatures. For example, in writing systems that feature initial, medial and final letter-forms, such as the Persian alphabet, when a ZWNJ is placed between two characters that would otherwise be joined into a ligature, it instead prevents the ligature and causes them to be printed in their final and initial forms, respectively. This is also an effect of a space character, but a ZWNJ is used when it is desirable to keep the characters closer together or to connect a word with its morpheme.
In the written form of many languages, indentation describes empty space, a.k.a. white space, used around text to signify an important aspect of the text such as:
In computing and typesetting, a soft hyphen or syllable hyphen, is a code point reserved in some coded character sets for the purpose of breaking words across lines by inserting visible hyphens if they fall on the line end but remain invisible within the line.
In word processing and digital typesetting, a non-breaking space, also called NBSP, required space, hard space, or fixed space, is a space character that prevents an automatic line break at its position. In some formats, including HTML, it also prevents consecutive whitespace characters from collapsing into a single space. Non-breaking space characters with other widths also exist.
Line breaking, also known as word wrapping, is breaking a section of text into lines so that it will fit into the available width of a page, window or other display area. In text display, line wrap is continuing on a new line when a line is full, so that each line fits into the viewable window, allowing text to be read from top to bottom without any horizontal scrolling. Word wrap is the additional feature of most text editors, word processors, and web browsers, of breaking lines between words rather than within words, where possible. Word wrap makes it unnecessary to hard-code newline delimiters within paragraphs, and allows the display of text to adapt flexibly and dynamically to displays of varying sizes.
The zero-width joiner is a non-printing character used in the computerized typesetting of writing systems in which the shape or positioning of a grapheme depends on its relation to other graphemes, such as the Arabic script or any Indic script. Sometimes the Roman script is to be counted as complex, e.g. when using a Fraktur typeface. When placed between two characters that would otherwise not be connected, a ZWJ causes them to be printed in their connected forms.
A whitespace character is a character data element that represents white space when text is rendered for display by a computer.
In a written or published work, an initial is a letter at the beginning of a word, a chapter, or a paragraph that is larger than the rest of the text. The word is ultimately derived from the Latin initiālis, which means of the beginning. An initial is often several lines in height, and, in older books or manuscripts, may take the form of an inhabited or historiated initial. Certain important initials, such as the Beatus initial, or B, of Beatus vir... at the opening of Psalm 1 at the start of a vulgate Latin. These specific initials in an illuminated manuscript were also called initia.
The fmt command in Unix, Plan 9, Inferno, and Unix-like operating systems is used to format natural language text for humans to read.
The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.
The word joiner (WJ) is a Unicode format character which is used to indicate that line breaking should not occur at its position. It does not affect the formation of ligatures or cursive joining and is ignored for the purpose of text segmentation. It is encoded since Unicode version 3.2 as U+2060WORD JOINER.
The dash is a punctuation mark consisting of a long horizontal line. It is similar in appearance to the hyphen but is longer and sometimes higher from the baseline. The most common versions are the en dash–, generally longer than the hyphen but shorter than the minus sign; the em dash—, longer than either the en dash or the minus sign; and the horizontal bar―, whose length varies across typefaces but tends to be between those of the en and em dashes.
fold is a Unix command used for making a file with long lines more readable on a limited width computer terminal by performing a line wrap.
The Unicode Standard assigns various properties to each Unicode character and code point.
Sentence spacing in digital media concerns the horizontal width of the space between sentences in computer- and web-based media. Digital media allow sentence spacing variations not possible with the typewriter. Most digital fonts permit the use of a variable space or a no-break space. Some modern font specifications, such as OpenType, have the ability to automatically add or reduce space after punctuation, and users may be able to choose sentence spacing variations.
A figure space or numeric space is a typographic unit equal to the size of a single numerical digit. Its size can fluctuate somewhat depending on which font is being used. This is the preferred space to use in numbers. It has the same width as a digit and keeps the number together for the purpose of line breaking.
General Punctuation is a Unicode block containing punctuation, spacing, and formatting characters for use with all scripts and writing systems. Included are the defined-width spaces, joining formats, directional formats, smart quotes, archaic and novel punctuation such as the interrobang, and invisible mathematical operators.