Formatted text

Last updated

In computing, formatted text, styled text, or rich text, as opposed to plain text, is digital text which has styling information beyond the minimum of semantic elements: colours, styles (boldface, italic), sizes, and special features in HTML (such as hyperlinks).

Contents

Terminology

Formatted text cannot rightly be identified with binary files or be distinct from ASCII text. This is because formatted text is not necessarily binary, it may be text-only, such as HTML, RTF or enriched text files, and it may be ASCII-only. Conversely, a plain text file may be non-ASCII (in an encoding such as Unicode UTF-8).[ further explanation needed ] Text-only formatted text is achieved by markup which too is textual, while some editors of formatted text like Microsoft Word save in a binary format.

Beginnings of formatted text

Formatted text has its genesis in the pre-computer use of underscoring to embolden passages in typewritten manuscripts. In the first interactive systems of early computer technology, underlining was not possible, and users made up for this lack (and the lack of formatting in ASCII) by using certain symbols as substitutes. Emphasis, for example, could be achieved in ASCII in a number of ways:

Surrounding by underscores was also used for book titles: Look it up in _The_C_Programming_Language_.

Markup languages

Formatting can be marked by tags distinguished from the body text by special characters, such as angle brackets in HTML. For example, this text:

The dog is classified as Canis familiaris in taxonomy.

is marked up in HTML thus:

<p>The dog is classified as <i>Canis familiaris</i> in taxonomy.</p>

The italicised text is enclosed by an opening and a closing italics tag. In LaTeX, the text would be marked up like this:

The dog is classified as \textit{Canis familiaris} in taxonomy. 

Most markup languages can be edited with any text editor, needing no special software. Many markup languages can also be edited with specialized software designed to automate some functions or present the output as WYSIWYG.

Formatted document files

Since the invention of MacWrite, the first WYSIWYG word processor, in which the typist codes the formatting visually rather than by inserting textual markup, word processors have tended to save to binary files. Opening such files with a text editor reveals them embedded with various binary characters, either around the formatted text (e.g. in WordPerfect) or separate from it, at the beginning or end of the file (e.g. in Microsoft Word).

Formatted text documents in binary files have, however, the disadvantages of formatting scope and secrecy. Whereas the extent of formatting is accurately marked in markup languages, WYSIWYG formatting is based on memory, that is, keeping for example your pressing of the boldface button until cancelled. This can lead to formatting mistakes and maintenance troubles. As for secrecy, formatted text document file formats tend to be proprietary and undocumented, leading to difficulty in coding compatibility by third parties, and also to unnecessary upgrades because of version changes.

WordStar was a popular word processor that did not use binary files with hidden characters.

OpenOffice.org Writer saves files in an XML format. However, the resultant file is a binary since it is compressed (a tarball equivalent).

PDF is another formatted text file format that is usually binary (using compression for the text, and storing graphics and fonts in binary). It is generally an end-user format, written from an application such as Microsoft Word or OpenOffice.org Writer, and not editable by the user once done.

See also

Related Research Articles

A file viewer is utility application software on operating systems for example; Windows, MacOS and Linux, that handle the access of files that are located on data storage devices to be accessible by the user. File viewing utility software allows the user to open and read content on a Personal Computer (PC) or on a Mobile phone.

<span class="mw-page-title-main">LaTeX</span> Document preparation software system

LaTeX is a software system for typesetting documents. LaTeX markup describes the content and layout of the document, as opposed to the formatted text found in WYSIWYG word processors like Microsoft Word, LibreOffice Writer and Apple Pages. The writer uses markup tagging conventions to define the general structure of a document, to stylise text throughout a document, and to add citations and cross-references. A TeX distribution such as TeX Live or MiKTeX is used to produce an output file suitable for printing or digital distribution.

<span class="mw-page-title-main">Markup language</span> Modern system for annotating a document

A markuplanguage is a text-encoding system which specifies the structure and formatting of a document and potentially the relationship between its parts. Markup is often used to control the display of the document or to enrich its content to facilitate automated processing.

Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets. Email messages with MIME formatting are typically transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).

<span class="mw-page-title-main">Plain text</span> Term for computer data consisting only of unformatted characters of readable material

In computing, plain text is a loose term for data that represent only characters of readable material but not its graphical representation nor other objects. It may also include a limited number of "whitespace" characters that affect simple arrangement of text, such as spaces, line breaks, or tabulation characters. Plain text is different from formatted text, where style information is included; from structured text, where structural parts of the document such as paragraphs, sections, and the like are identified; and from binary files in which some portions must be interpreted as binary objects.

The Rich Text Format is a proprietary document file format with published specification developed by Microsoft Corporation from 1987 until 2008 for cross-platform document interchange with Microsoft products. Prior to 2008, Microsoft published updated specifications for RTF with major revisions of Microsoft Word and Office versions.

<span class="mw-page-title-main">Text editor</span> Computer software used to edit plain text documents

A text editor is a type of computer program that edits plain text. Such programs are sometimes known as "notepad" software. Text editors are provided with operating systems and software development packages, and can be used to change files such as configuration files, documentation files and programming language source code.

In computing, WYSIWYG, an acronym for What You See Is What You Get, refers to software which allows content to be edited in a form that resembles its appearance when printed or displayed as a finished product, such as a printed document, web page, or slide presentation. WYSIWYG implies a user interface that allows the user to view something very similar to the result while the document is being created. In general, WYSIWYG implies the ability to directly manipulate the layout of a document without having to type or remember names of layout commands.

<span class="mw-page-title-main">WordStar</span> Word processor application

WordStar is a word processor application for microcomputers. It was published by MicroPro International and originally written for the CP/M-80 operating system, with later editions added for MS-DOS and other 16-bit PC OSes. Rob Barnaby was the sole author of the early versions of the program.

Desktop publishing (DTP) is the creation of documents using page layout software on a personal ("desktop") computer. It was first used almost exclusively for print publications, but now it also assists in the creation of various forms of online content. Desktop publishing software can generate layouts and produce typographic-quality text and images comparable to traditional typography and printing. Desktop publishing is also the main reference for digital typography. This technology allows individuals, businesses, and other organizations to self-publish a wide variety of content, from menus to magazines to books, without the expense of commercial printing.

A HTML editor is a program used for editing HTML, the markup of a web page. Although the HTML markup in a web page can be controlled with any text editor, specialized HTML editors can offer convenience, added functionality, and organisation. For example, many HTML editors handle not only HTML, but also related technologies such as CSS, XML and JavaScript or ECMAScript. In some cases they also manage communication with remote web servers via FTP and WebDAV, and version control systems such as Subversion or Git. Many word processing, graphic design and page layout programs that are not dedicated to web design, such as Microsoft Word or Quark XPress, also have the ability to function as HTML editors.

<span class="mw-page-title-main">WYSIWYM</span> Acronym for "what you see is what you mean"

In computing, What You See Is What You Mean is a paradigm for editing a structured document. It is an adjunct to the better-known WYSIWYG paradigm, which displays the result of a formatted document as it will appear on screen or in print—without showing the descriptive code underneath.

<span class="mw-page-title-main">Newline</span> Special characters in computing signifying the end of a line of text

A newline is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a sequence of characters, is used to signify the end of a line of text and the start of a new one.

A document file format is a text or binary file format for storing documents on a storage media, especially for use by computers. There currently exist a multitude of incompatible document file formats.

A lightweight markup language (LML), also termed a simple or humane markup language, is a markup language with simple, unobtrusive syntax. It is designed to be easy to write using any generic text editor and easy to read in its raw form. Lightweight markup languages are used in applications where it may be necessary to read the raw document as well as the final rendered output.

<span class="mw-page-title-main">Markdown</span> Plain text markup language

Markdown is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber created Markdown in 2004 as a markup language that is easy to read in its source code form. Markdown is widely used for blogging and instant messaging, and also used elsewhere in online forums, collaborative software, documentation pages, and readme files.

Data conversion is the conversion of computer data from one format to another. Throughout a computer environment, data is encoded in a variety of ways. For example, computer hardware is built on the basis of certain standards, which requires that data contains, for example, parity bit checks. Similarly, the operating system is predicated on certain standards for data and file handling. Furthermore, each computer program handles data in a different manner. Whenever any one of these variables is changed, data must be converted in some way before it can be used by a different computer, operating system or program. Even different versions of these elements usually involve different data structures. For example, the changing of bits from one format to another, usually for the purpose of application interoperability or of the capability of using new features, is merely a data conversion. Data conversions may be as simple as the conversion of a text file from one character encoding system to another; or more complex, such as the conversion of office file formats, or the conversion of image formats and audio file formats.

Setext is a lightweight markup language used to format plain text documents such as e-newsletters, Usenet postings, and e-mails. In contrast to some other markup languages, the markup is easily readable without any parsing or special software.