The zero-width space(), abbreviated ZWSP, is a non-printing character used in computerized typesetting to indicate where the word boundaries are, without actually displaying a visible space in the rendered text. This enables text-processing systems for scripts that do not use explicit spacing to recognize where word boundaries are for the purpose of handling line breaks appropriately. Zero-width space is unicode character U+200B
, and is located in the unicode General Punctuation block, and can be represented by numeric character references ​
or ​
.
The zero-width space marks a potential line break without hyphenation. Its semantics and HTML implementation are similar to the soft hyphen, but soft hyphens display a hyphen character at the point where the line is broken.
The zero-width space can be used to mark word breaks in languages without visible space between words, such as Thai, Myanmar, Khmer, and Japanese. [1]
In justified text, the rendering engine may add inter-character spacing, also known as letter spacing, between letters separated by a zero-width space, unlike around fixed-width spaces. [1]
To show the effect of the zero-width space in text, the following words have been separated with zero-width spaces:
LoremIpsumDolorSitAmetConsecteturAdipiscingElitSedDoEiusmodTemporIncididuntUtLaboreEtDoloreMagnaAliquaUtEnimAdMinimVeniamQuisNostrudExercitationUllamcoLaborisNisiUtAliquipExEaCommodoConsequatDuisAuteIrureDolorInReprehenderitInVoluptateVelitEsseCillumDoloreEuFugiatNullaPariaturExcepteurSintOccaecatCupidatatNonProidentSuntInCulpaQuiOfficiaDeseruntMollitAnimIdEstLaborum
By contrast, the following words have not been separated:
LoremIpsumDolorSitAmetConsecteturAdipiscingElitSedDoEiusmodTemporIncididuntUtLaboreEtDoloreMagnaAliquaUtEnimAdMinimVeniamQuisNostrudExercitationUllamcoLaborisNisiUtAliquipExEaCommodoConsequatDuisAuteIrureDolorInReprehenderitInVoluptateVelitEsseCillumDoloreEuFugiatNullaPariaturExcepteurSintOccaecatCupidatatNonProidentSuntInCulpaQuiOfficiaDeseruntMollitAnimIdEstLaborum
The first text is broken into lines but only at word boundaries, and resizing the browser window will re-break the text accordingly, while the second text is not broken at all.
In HTML pages, the HTML element <wbr>
functions as a zero-width space. In Internet Explorer 6, the zero-width space was not supported in some fonts. [2]
ICANN rules prohibit domain names from containing non-displayed characters, including the zero-width space, and most browsers prohibit their use within domain names because they can be used to create a homograph attack, where a malicious URL is visually indistinguishable from a legitimate one. [3] [4]
The zero-width space character is encoded in Unicode as U+200BZERO WIDTH SPACE, [5] and input in HTML as ​
, ​
or ​
. Contrary to what their names suggest, the character entities ​
, ​
, ​
, and ​
also refer to the zero-width space. [6]
The TeX representation is \hskip0pt
; the LaTeX representation is \hspace{0pt}
; [7] and the groff representation is \:
. [8]
The hyphen‐ is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation.
In punctuation, a word divider is a form of glyph which separates written words. In languages which use the Latin, Cyrillic, and Arabic alphabets, as well as other scripts of Europe and West Asia, the word divider is a blank space, or whitespace. This convention is spreading, along with other aspects of European punctuation, to Asia and Africa, where words are usually written without word separation.
In writing, a space is a blank area that separates words, sentences, syllables and other written or printed glyphs (characters). Conventions for spacing vary among languages, and in some languages the spacing rules are complex. Inter-word spaces ease the reader's task of identifying words, and avoid outright ambiguities such as "now here" vs. "nowhere". They also provide convenient guides for where a human or program may start new lines.
In publishing and graphic design, Lorem ipsum is a placeholder text commonly used to demonstrate the visual form of a document or a typeface without relying on meaningful content. Lorem ipsum may be used as a placeholder before the final copy is available. It is also used to temporarily replace text in a process called greeking, which allows designers to consider the form of a webpage or publication, without the meaning of the text influencing the design.
The zero-width non-joiner is a non-printing character used in the computerization of writing systems that make use of ligatures. When placed between two characters that would otherwise be connected into a ligature, a ZWNJ causes them to be printed in their final and initial forms, respectively. This is also an effect of a space character, but a ZWNJ is used when it is desirable to keep the characters closer together or to connect a word with its morpheme.
In the written form of many languages, indentation describes empty space, a.k.a. white space, used around text to signify an important aspect of the text such as:
In computing and typesetting, a soft hyphen or syllable hyphen, is a code point reserved in some coded character sets for the purpose of breaking words across lines by inserting visible hyphens if they fall on the line end but remain invisible within the line.
In word processing and digital typesetting, a non-breaking space, also called NBSP, required space, hard space, or fixed space, is a space character that prevents an automatic line break at its position. In some formats, including HTML, it also prevents consecutive whitespace characters from collapsing into a single space. Non-breaking space characters with other widths also exist.
Line breaking, also known as word wrapping, is breaking a section of text into lines so that it will fit into the available width of a page, window or other display area. In text display, line wrap is continuing on a new line when a line is full, so that each line fits into the viewable window, allowing text to be read from top to bottom without any horizontal scrolling. Word wrap is the additional feature of most text editors, word processors, and web browsers, of breaking lines between words rather than within words, where possible. Word wrap makes it unnecessary to hard-code newline delimiters within paragraphs, and allows the display of text to adapt flexibly and dynamically to displays of varying sizes.
The zero-width joiner is a non-printing character used in the computerized typesetting of writing systems in which the shape or positioning of a grapheme depends on its relation to other graphemes, such as the Arabic script or any Indic script. Sometimes the Roman script is to be counted as complex, e.g. when using a Fraktur typeface. When placed between two characters that would otherwise not be connected, a ZWJ causes them to be printed in their connected forms.
A whitespace character is a character data element that represents white space when text is rendered for display by a computer.
In a written or published work, an initial is a letter at the beginning of a word, a chapter, or a paragraph that is larger than the rest of the text. The word is ultimately derived from the Latin initiālis, which means of the beginning. An initial is often several lines in height, and, in older books or manuscripts, may take the form of an inhabited or historiated initial. Certain important initials, such as the Beatus initial, or B, of Beatus vir... at the opening of Psalm 1 at the start of a vulgate Latin. These specific initials in an illuminated manuscript were also called initia.
The fmt command in Unix, Plan 9, Inferno, and Unix-like operating systems is used to format natural language text for humans to read.
The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.
The word joiner (WJ) is a Unicode format character which is used to indicate that line breaking should not occur at its position. It does not affect the formation of ligatures or cursive joining and is ignored for the purpose of text segmentation. It is encoded since Unicode version 3.2 as U+2060WORD JOINER.
The dash is a punctuation mark consisting of a long horizontal line. It is similar in appearance to the hyphen but is longer and sometimes higher from the baseline. The most common versions are the en dash–, generally longer than the hyphen but shorter than the minus sign; the em dash—, longer than either the en dash or the minus sign; and the horizontal bar―, whose length varies across typefaces but tends to be between those of the en and em dashes.
The tie is a symbol in the shape of an arc similar to a large breve, used in Greek, phonetic alphabets, and Z notation. It can be used between two characters with spacing as punctuation, non-spacing as a diacritic, or (underneath) as a proofreading mark. It can be above or below, and reversed. Its forms are called tie, double breve, enotikon or papyrological hyphen, ligature tie, and undertie.
fold is a Unix command used for making a file with long lines more readable on a limited width computer terminal by performing a line wrap.
The Unicode Standard assigns various properties to each Unicode character and code point.
General Punctuation is a Unicode block containing punctuation, spacing, and formatting characters for use with all scripts and writing systems. Included are the defined-width spaces, joining formats, directional formats, smart quotes, archaic and novel punctuation such as the interrobang, and invisible mathematical operators.