Hyphen-minus

Last updated

-
Hyphen-minus
In  Unicode U+002D-HYPHEN-MINUS
Graphical variants
U+FE63SMALL HYPHEN-MINUS
U+FF0DFULLWIDTH HYPHEN-MINUS
Different from
Different fromU+2010 HYPHEN

U+2011 NON-BREAKING HYPHEN
U+2212 MINUS SIGN
U+2013 EN DASH

Contents

U+2014 EM DASH

The hyphen-minus symbol - is the form of hyphen most commonly used in digital documents. On most keyboards, it is the only character that resembles a minus sign or a dash so it is also used for these. [1] The name hyphen-minus derives from the original ASCII standard, [2] where it was called hyphen (minus). [3] The character is referred to as a hyphen, a minus sign, or a dash according to the context where it is being used.

Description

-+
-+
hyphen-minus, plus, and minus signs
in proportional and monospaced fonts

In early typewriters and character encodings, a single key/code was almost always used for hyphen, minus, various dashes, and strikethrough, since they all have a roughly similar appearance. The current Unicode Standard specifies distinct characters for several different dashes, an unambiguous minus sign (sometimes called the Unicode minus) at code point U+2212, an unambiguous hyphen (sometimes called the Unicode hyphen) at U+2010, the hyphen-minus at U+002D and a variety of other hyphen symbols for various uses. When a hyphen is called for, the hyphen-minus is a common choice as it is well known, easy to enter on keyboards, and still the only form recognized by many data formats and computer languages. Though the Unicode Standard states that the U+2010 hyphen is "preferred" over the hyphen-minus, [4] the standard itself uses the hyphen-minus as its hyphen character. [5]

In most modern computer fonts, the hyphen-minus is either identical or very similar to the Unicode hyphen. [6] [lower-alpha 1]

In mathematical texts that include the plus sign, the Unicode minus is preferred to the hyphen-minus, because its metrics match the plus sign in level and length. [lower-alpha 2]

Uses

Typing

This character is typed when a hyphen or a minus sign is wanted. Based on old typewriter conventions, it is common to use a pair -- to represent an em dash , [7] and to put spaces around it  -  to represent a spaced en dash   ; this practice is deprecated in professional typography. [8] Some word processors automatically convert these to the correct dash. The character can also be typed multiple times to simulate a horizontal line (though in most cases, repeated entry of the underscore will produce a solid line). Alternating the hyphen-minus with spaces produces a "dashed" line, often to indicate where paper is to be cut. On a typewriter, over-striking a section of text with this is used for strikethrough.

Programming languages

Most programming languages use the hyphen-minus for denoting subtraction and negation.[ further explanation needed ] [9] [10] It is rarely used to indicate a range, due to ambiguity with subtraction. Generally, other characters, such as the Unicode U+2212 MINUS SIGN are not recognized as an operator.[ citation needed ]

In some programming languages (for example MySQL) -- (two hyphen-minus) mark the beginning of a comment. It can be used to start the signature block in Usenet news system. YAML uses --- (three hyphen-minuses) to end a section.

Command line

The hyphen-minus character is often used when specifying command-line options, a convention popularized by Unix. Examples of the "short" form are -R or -q. A user can specify both by using -Rq. Some implementations allow two hyphen-minuses to specify "long" option names as --recursive or --quiet. These are easier to understand when reading commands (some software does not care about the number of hyphen-minuses, and either does not allow combinations of single-letter options, or requires the user to rearrange them, so they do not match a long option). A double hyphen-minus by itself (followed by a space) indicates that there are no more options, which is useful when one needs to specify a filename that starts with a hyphen-minus. An option of just a hyphen-minus (followed by a space) may be recognized in lieu of a filename and indicates that stdin is to be read.

diff output

- is used to denote deleted lines in diff output in the context format or the unified format.

Encoding

The glyph has a code point in Unicode as U+002D-HYPHEN-MINUS. It is also in ASCII with the same value.

See also

Explanatory notes

  1. In Lucida Sans Unicode, the hyphen-minus is drawn identically to the en dash.
  2. The precise relationships depend on typeface design choices.

Related Research Articles

The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation.

An interpunct·, also known as an interpoint, middle dot, middot, centered dot or centred dot, is a punctuation mark consisting of a vertically centered dot used for interword separation in Classical Latin. It appears in a variety of uses in some modern languages.

The tilde is a grapheme ˜ or ~ with a number of uses. The name of the character came into English from Spanish tilde, which in turn came from the Latin titulus, meaning 'title' or 'superscription'. Its primary use is as a diacritic (accent) in combination with a base letter. Its freestanding form is used in modern texts mainly to indicate approximation.

The plus sign and the minus sign are mathematical symbols used to denote positive and negative functions, respectively. In addition, + represents the operation of addition, which results in a sum, while represents subtraction, resulting in a difference. Their use has been extended to many other meanings, more or less analogous. Plus and minus are Latin terms meaning "more" and "less", respectively.

The division sign is a mathematical symbol consisting of a short horizontal line with a dot above and another dot below, used in Anglophone countries to indicate the operation of division. This usage, though widespread in some countries, is not universal and the symbol has a different meaning in other countries. Its use to denote division is not recommended in the ISO 80000-2 standard for mathematical notation.

UTF-7 is an obsolete variable-length character encoding for representing Unicode text using a stream of ASCII characters. It was originally intended to provide a means of encoding Unicode text for use in Internet E-mail messages that was more efficient than the combination of UTF-8 with quoted-printable.

<span class="mw-page-title-main">Pound sign</span> Currency sign

The pound sign is the symbol for the pound unit of sterling – the currency of the United Kingdom and its associated Crown Dependencies and British Overseas Territories and previously of Great Britain and of the Kingdom of England. The same symbol is used for other currencies called pound, such as the Egyptian and Syrian pounds. The sign may be drawn with one or two bars depending on personal preference, but the Bank of England has used the one-bar style exclusively on banknotes since 1975.

<span class="mw-page-title-main">Underscore</span> Typographic symbol (_) (underline)

An underscore or underline is a line drawn under a segment of text. In proofreading, underscoring is a convention that says "set this text in italic type", traditionally used on manuscript or typescript as an instruction to the printer. Its use to add emphasis in modern finished documents is generally avoided.

The vertical bar, |, is a glyph with various uses in mathematics, computing, and typography. It has many names, often related to particular meanings: Sheffer stroke, pipe, bar, or, vbar, and others.

<span class="mw-page-title-main">ArmSCII</span> Set of obsolete single-byte character encodings

ArmSCII or ARMSCII is a set of obsolete single-byte character encodings for the Armenian alphabet defined by Armenian national standard 166–9. ArmSCII is an acronym for Armenian Standard Code for Information Interchange, similar to ASCII for the American standard. It has been superseded by the Unicode standard.

In Latin script, the double hyphen is a punctuation mark that consists of two parallel hyphens. It was a development of the earlier double oblique hyphen, which developed from a Central European variant of the virgule slash, originally a form of scratch comma. Similar marks are used in other scripts.

A whitespace character is a character data element that represents white space when text is rendered for display by a computer.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

The dash is a punctuation mark consisting of a long horizontal line. It is similar in appearance to the hyphen but is longer and sometimes higher from the baseline. The most common versions are the en dash, generally longer than the hyphen but shorter than the minus sign; the em dash, longer than either the en dash or the minus sign; and the horizontal bar, whose length varies across typefaces but tends to be between those of the en and em dashes.

JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. The official title of the current standard is 7-bit and 8-bit double byte coded KANJI sets for information interchange. It was originally established as JIS C 6226 in 1978, and has been revised in 1983, 1990, and 1997. It is also called Code page 952 by IBM. The 1978 version is also called Code page 955 by IBM.

Numeric Annotation Glyphs or NAGs are used to annotate chess games when using a computer, typically providing an assessment of a chess move or a chess position. NAGs exist to indicate a simple annotation in a language independent manner. NAGs were first formally documented in 1994 by Steven J. Edwards in his Portable Game Notation Specification and Implementation Guide. Within the PGN specification, 256 NAGs are proposed of which the first 140 are defined; the remainder were reserved for future definition.

Sentence spacing in digital media concerns the horizontal width of the space between sentences in computer- and web-based media. Digital media allow sentence spacing variations not possible with the typewriter. Most digital fonts permit the use of a variable space or a no-break space. Some modern font specifications, such as OpenType, have the ability to automatically add or reduce space after punctuation, and users may be able to choose sentence spacing variations.

A typographic approximation is a replacement of an element of the writing system with another glyph or glyphs. The replacement may be a nearly homographic character, a digraph, or a character string. An approximation is different from a typographical error in that an approximation is intentional and aims to preserve the visual appearance of the original. The concept of approximation also applies to the World Wide Web and other forms of textual information available via digital media, though usually at the level of characters, not glyphs.

In ASCII and Unicode, the symbol - is represented by the code point U+002D-HYPHEN-MINUS. - may also refer to:

Caret is the name used familiarly for the character ^ provided on most QWERTY keyboards by typing ⇧ Shift+6. The symbol has a variety of uses in programming and mathematics. The name "caret" arose from its visual similarity to the original proofreader's caret, , a mark used in proofreading to indicate where a punctuation mark, word, or phrase should be inserted into a document. The ASCII standard (X3.64.1977) calls it a "circumflex"; the Unicode standard calls it a "circumflex accent", although it is no longer practicable for that purpose.

References

  1. Korpela, Jukka K. (2006). Unicode explained. O'Reilly. p. 382. ISBN   978-0-596-10121-3.[ dead link ]
  2. "3.1 General scripts" (PDF). Unicode Version 1.0 · Character Blocks. p. 30. Archived (PDF) from the original on 21 November 2021. Retrieved 10 December 2021. Loose vs. Precise Semantics. Some ASCII characters have multiple uses, either through ambiguity in the original standards or through accumulated reinterpretations of a limited codeset. For example, 27 hex is defined in ANSI X3.4 as apostrophe (closing single quotation mark; acute accent), and 2D hex as hyphen minus. In general, the Unicode standard provides the same interpretation for the equivalent code values, without adding to or subtracting from their semantics. The Unicode standard supplies unambiguous codes elsewhere for the most useful particular interpretations of these ASCII values; the corresponding unambiguous characters are cross-referenced in the character names list for this block. In a few cases, the Unicode standard indicates the generic interpretation of an ASCII code in the name of the corresponding Unicode character, for example U+0027 is APOSTROPHE-QUOTE'.
  3. "American National Standard X3.4-1977: American Standard Code for Information Interchange" (PDF). National Institute of Standards and Technology. p. 10 (4.2 Graphic characters). Archived (PDF) from the original on 9 October 2022. Retrieved 10 December 2021.
  4. "The Unicode Standard, Version 13.0, Chapter 6.2" (PDF). 2020. General Punctuation § Dashes and Hyphens. Archived (PDF) from the original on 22 January 2021. Retrieved 30 December 2020.
  5. Korpela, Jukka. "Dashes and Hyphens § Typographic Usage". Archived from the original on 26 January 2021. Retrieved 30 December 2020.
  6. Marian, Jakub. "Hyphen, minus, en-dash, and em-dash: difference and usage in English". Archived from the original on 25 December 2020. Retrieved 23 December 2020. A hyphen is usually very short (it has its own Unicode character, but you can use the hyphen-minus instead because it looks the same) ...
  7. French, Nigel (2006). InDesign Type: Professional Typography with Adobe InDesign CS2. Adobe Press. p. 72. ISBN   9780321385444 . Retrieved 4 July 2020.
  8. Bringhurst, Robert (2004). The elements of typographic style (third ed.). Hartley & Marks, Publishers. p. 80. ISBN   978-0-88179-206-5 . Retrieved 10 November 2020. In typescript, a double hyphen (--) is often used for a long dash. Double hyphens in a typeset document are a sure sign that the type was set by a typist, not a typographer. A typographer will use an em dash, three-quarter em, or en dash, depending on context or personal style. The em dash is the nineteenth-century standard, still prescribed in many editorial style books, but the em dash is too long for use with the best text faces. Like the oversized space between sentences, it belongs to the padded and corseted aesthetic of Victorian typography.
  9. Ritchie, Dennis (c. 1975). "C Reference Manual" (PDF). Bell Labs . Archived (PDF) from the original on 3 April 2017. Retrieved 7 December 2016.
  10. Marlow, Simon (ed.). Haskell 2010 Language Report (PDF). Archived (PDF) from the original on 4 December 2016. Retrieved 7 December 2016.[ page needed ]