Setext

Last updated
Setext
Filename extension
Developed byIan Feldman
Initial releaseJanuary 6, 1992;31 years ago (1992-01-06)
Type of format Lightweight markup language

Setext (Structure Enhanced Text) [2] is a lightweight markup language used to format plain text documents such as e-newsletters, Usenet postings, and e-mails. In contrast to some other markup languages (such as HTML), the markup is easily readable without any parsing or special software.

Contents

Setext was first introduced in 1991 by Ian Feldman for use in the TidBITS electronic newsletter.

Purpose

Setext allows viewing of marked-up documents without special viewing software. When appropriate software is used, however, a rich text-style experience is available to the user.

Smaller documents are trivial to create in any text editor.

To prevent errors, most large setext publications are created using a markup language such as HTML or SGML and then converted. The setext document can then be distributed without the need for the recipient to use a HTML email or web viewer.

Multiple setext documents in the same file

Multiple setext documents can be stored in the same file, similarly to how the mbox format can store multiple e-mail messages together.

It was initially announced [1] that multiple documents could be included in a single stream, separated by a special <end> tag serving as a document delimiter [lower-alpha 2] . After several months, it was clarified [3] that this tag was not an official part of setext, and that multiple documents should instead be delimited by $$ appearing at the end of a line of text.

Regardless of the number of documents stored in the same file, basic metadata can be stored about any or all of them by using the subject-tt tag syntax.

Setext tags

The following are the ten most common of the 16 different setext tags. [4] [5] [lower-alpha 3]

setext tag usage and examples [lower-alpha 4] [lower-alpha 5]
Name [lower-alpha 6] setext patternExample [lower-alpha 7] Comments
title-ttTitle

=====

This is a long title====================
A distinct title identified by the text, maximum one per setext.

Must start at the beginning of the line.

subhead-ttSubhead

-------

Subheading One--------------
A distinct subheading identified by the text, zero or more per text.

Must start at beginning of line. See note in title-tt about handling.

indent-tt66-char lines indented by 2 spaces
  First paragraph…   …more of paragraph. [blank line]   Next paragraph… 
Lines undented and unfolded (longer lines are generally tolerated by most parsers).

This is primary body text, generally plain undented in emails, etc. currently.

bold-tt**[multi ]word**
This is **very important**... 
One or more bold words, generally *word* or **word** in emails
italic-tt~word~
This is an ~italic~ word. 
A single, italicized word; multi-word form was not officially specified due to “visual-clarity reasons”

Multi-word form of ~first~second~third~ supported by setext2latex. [8]

underline-tt[_multi ]word_

[_multi]_word_

This is _underlined text_. 
This is _underlined_text_. 
Display in a (user) selected style, preferably with underlining--except in browsers where underlining corresponds to hot links.

One or more underlined words

hot-tt[multi_]word_
This is a hot_word_. 
Used to mark notes and URLs [lower-alpha 8] [lower-alpha 9]
include-tt>[space][text]
> This is quoted text...> ...more...
Displayed in a user selected style, preferably monospaced with the leading ">"
bullet-tt
*[space][text] 
* Item 1 that is...    ...really long* Item 2 
Displayed in bullet or list format.
href-tt^.. _hot_word URL^.. _Wikipedia_home_page https://wikipedia.org (Linked in the text with a hot-tt as Wikipedia_home_page_)
These 'link definitions' are commonly placed at the end of a paragraph/section, or at the very end of the setext document. [lower-alpha 9]


Standalone Setext files

By default all properly setext-ized files will have an ".etx" or ".ETX" suffix. This stands for an "emailable/enhanced text". [1]

See also

Other lightweight markup languages (inspired by Setext):

Related Research Articles

<span class="mw-page-title-main">HTML</span> Hypertext Markup Language

The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.

<span class="mw-page-title-main">Markup language</span> Modern system for annotating a document

Markuplanguage refers to a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document or to enrich its content to facilitating automated processing. A markup language is a set of rules governing what markup information may be included in a document and how it is combined with the content of the document in a way to facilitate use by humans and computer programs. The idea and terminology evolved from the "marking up" of paper manuscripts, which is traditionally written with a red pen or blue pencil on authors' manuscripts.

<span class="mw-page-title-main">Plain text</span> Term for computer data consisting only of unformatted characters of readable material

In computing, plain text is a loose term for data that represent only characters of readable material but not its graphical representation nor other objects. It may also include a limited number of "whitespace" characters that affect simple arrangement of text, such as spaces, line breaks, or tabulation characters. Plain text is different from formatted text, where style information is included; from structured text, where structural parts of the document such as paragraphs, sections, and the like are identified; and from binary files in which some portions must be interpreted as binary objects.

<span class="mw-page-title-main">Standard Generalized Markup Language</span> Markup language

The Standard Generalized Markup Language is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of documentation.

<span class="mw-page-title-main">Hyperlink</span> Method of referencing visual computer data

In computing, a hyperlink, or simply a link, is a digital reference to data that the user can follow or be guided by clicking or tapping. A hyperlink points to a whole document or to a specific element within a document. Hypertext is text with hyperlinks. The text that is linked from is known as anchor text. A software system that is used for viewing and creating hypertext is a hypertext system, and to create a hyperlink is to hyperlink. A user following hyperlinks is said to navigate or browse the hypertext.

<span class="mw-page-title-main">Typesetting</span> Composition of text by means of arranging physical types or digital equivalents

Typesetting is the composition of text by means of arranging physical type in mechanical systems or glyphs in digital systems representing characters. Stored types are retrieved and ordered according to a language's orthography for visual display. Typesetting requires one or more fonts. One significant effect of typesetting was that authorship of works could be spotted more easily, making it difficult for copiers who have not gained permission.

An HTML element is a type of HTML document component, one of several types of HTML nodes. The first used version of HTML was written by Tim Berners-Lee in 1993 and there have since been many versions of HTML. The most commonly used version is HTML 4.01, which became official standard in December 1999. An HTML document is composed of a tree of simple HTML nodes, such as text nodes, and HTML elements, which add semantics and formatting to parts of document. Each element can have HTML attributes specified. Elements can also have content, including other elements and text.

A lightweight markup language (LML), also termed a simple or humane markup language, is a markup language with simple, unobtrusive syntax. It is designed to be easy to write using any generic text editor and easy to read in its raw form. Lightweight markup languages are used in applications where it may be necessary to read the raw document as well as the final rendered output.

In computing, formatted text, styled text, or rich text, as opposed to plain text, is digital text which has styling information beyond the minimum of semantic elements: colours, styles, sizes, and special features in HTML.

<span class="mw-page-title-main">Markdown</span> Plain text markup language

Markdown is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber and Aaron Swartz created Markdown in 2004 as a markup language that is appealing to human readers in its source code form. Markdown is widely used in blogging, instant messaging, online forums, collaborative software, documentation pages, and readme files.

Search engine indexing is the collecting, parsing, and storing of data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science. An alternate name for the process, in the context of search engines designed to find web pages on the Internet, is web indexing.

Scribe is a markup language and word processing system that pioneered the use of descriptive markup. Scribe was revolutionary when it was proposed, because it involved for the first time a clean separation of presentation and content.

txt2tags is a document generator software that uses a lightweight markup language. txt2tags is free software under GNU General Public License.

Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.

A structured document is an electronic document where some method of markup is used to identify the whole and parts of the document as having various meanings beyond their formatting. For example, a structured document might identify a certain portion as a "chapter title" rather than as "Helvetica bold 24" or "indented Courier". Such portions in general are commonly called "components" or "elements" of a document.

<span class="mw-page-title-main">Org-mode</span>

Org Mode is a document editing, formatting, and organizing mode, designed for notes, planning, and authoring within the free software text editor Emacs. The name is used to encompass plain text files that include simple marks to indicate levels of a hierarchy, and an editor with functions that can read the markup and manipulate hierarchy elements.

The Office Open XML file formats are a set of file formats that can be used to represent electronic office documents. There are formats for word processing documents, spreadsheets and presentations as well as specific formats for material such as mathematical formulae, graphics, bibliographies etc.

References

Notes

  1. ("enhanced/ e-mailable text") [1]
  2. This was to function much in the same way as the original purpose of the ASCII “File Separator” (FS; 0x1C; typed as Ctrl-\) C0 control character but it proved too visually distracting and so was removed before setext was finalized.
  3. (Not currently shown in table: note-tt, quote-tt, subject-tt, suppress-tt, twobuck-tt, and twodot-tt)
  4. (For a document to be valid setext, the only required tag is either ‘subhead-tt’ or ‘title-tt’ - all others are optional.)
  5. “(A) formal definition of what makes a setext: a text that contains at least one verified setext subhead or setext title [6] [7]
  6. (‘-tt’ stands for ‘typotag’, the Feldman’s shorthand for ‘typographic tags’; contrast with the ‘tags’ used in modern systems for categorizing data or photos into groups)
  7. (i.e. the actual text as stored / transmitted, except in the case of bullet-tt. Visual appearance would be defined/controlled by the program displaying the document.)
  8. "synonymous with the ‘grouped’ style of HyperCard"

References

  1. 1 2 3 "TidBITS in new format". TidBITS. 1992-01-06. Retrieved 2022-07-01.
  2. Engst, Adam C. "comp.sys.mac.announce / TidBITS file server available". UseNet. Retrieved 21 December 2015.
  3. "Administrivia". TidBITS. 1992-03-09. Retrieved 2022-07-01.
  4. Oliver, Erik. "Setext command reference". Erik Oliver's Home Page. Archived from the original on 2022-08-16. Retrieved 2022-07-01.
  5. Feldman, Ian (1992-08-16). "What is setext". bsdi.org. Archived from the original on 2001-04-30.
  6. Feldman, Ian (1992-03-15). "setext sermon – Part 1". bsdi.com. Archived from the original on 2001-03-09.
  7. Feldman, Ian (1992-03-29). "setext sermon – Part 2". bsdi.org. Archived from the original on 2001-04-30.
  8. "Setext2LaTeX". freecode.com. Archived from the original on 26 June 2014. Retrieved 16 August 2022.
    Oliver, Erik (2007). "Setext2LaTeX – setext -> LaTeX converter". Erik Oliver's Home Page. Archived from the original on 26 June 2014. Retrieved 16 August 2022.

Implementations