This article possibly contains original research .(April 2012) |
Filename extension | .rtf |
---|---|
Internet media type | |
Type code | 'RTF.' [3] [4] [5] |
Uniform Type Identifier (UTI) | public.rtf |
Magic number | {\rtf |
Developed by | Microsoft |
Initial release | 1987 |
Latest release | 1.9.1 19 March 2008 |
Type of format | Document file format |
Open format? | No |
The Rich Text Format (often abbreviated RTF) is a proprietary [6] [7] [8] document file format with published specification developed by Microsoft Corporation from 1987 until 2008 for cross-platform document interchange with Microsoft products. Prior to 2008, Microsoft published updated specifications for RTF with major revisions of Microsoft Word and Office versions.
Most word processors are able to read and write some versions of RTF. [9] There are several different revisions of RTF specification; portability of files will depend on what version of RTF is being used. [7] [10]
RTF should not be confused with enriched text [11] or its predecessor Rich Text, [12] [13] or with IBM's RFT-DCA (Revisable Format Text-Document Content Architecture), as these are different specifications.
Richard Brodie, Charles Simonyi, and David Luebbert, members of the Microsoft Word development team, developed the original RTF in the middle to late 1980s. The first RTF reader and writer shipped in 1987 as part of Microsoft Word 3.0 for Macintosh, which implemented the RTF version 1.0 specification. All subsequent releases of Microsoft Word for Macintosh, as well as all Windows versions, can read and write in RTF format.
Microsoft maintains RTF. The final version was 1.9.1 in 2008, which implemented features of Office 2007. Microsoft has discontinued enhancements to the RTF specification, so features new to Word 2010 or a later version will not save properly to RTF. [14] Microsoft anticipates no further updates to RTF, but has stated willingness to consider editorial and other non-substantive modifications of the RTF Specification during an associated ISO/IEC 29500 balloting period. [15]
RTF files were used to produce Windows Help files, though these have since been superseded by Microsoft Compiled HTML Help files.
RTF version | Publication date | Microsoft Word version | MS Word release date | Notes |
---|---|---|---|---|
1.0 | 1987 | Microsoft Word 3 | 1987 | The latest revision came in June 1992. [18] [19] The 1992 revision defined support for Microsoft Object Linking and Embedding (OLE) objects and Macintosh Edition Manager subscriber objects. It also supported inclusion of the Windows Metafile, PICT, Windows device-dependent bitmap, Windows device-independent bitmap and OS/2 Metafile image types in RTF. |
1.1 | Microsoft Word 4 | 1989 | Allowed for font embedding, which lets font data to be located inside the file. | |
1.2 | 1993 | Microsoft Word 5 | 1991 | [20] [21] |
1.3 | January 1994 | Microsoft Word 6 | 1993 | 1/94 GC0165; for device-independence and interoperability, encouraged embedding bitmaps within Windows Metafiles, [22] [23] instead of using Windows device-independent bitmaps or Windows device-dependent bitmaps. |
1.4 | September 1995 | Microsoft Word 95/Word 7 | 1995 | [24] |
1.5 | April 1997 | Microsoft Word 97/Word 8 | 1997 | Introduced Unicode RTF, which supports 16-bit Unicode character encoding scheme; defined inclusion of PNG, JPEG and EMF picture types in hexadecimal (the default) or binary format in a RTF file. [25] Also contained a Japanese local RTF specification called RTF-J for the Japanese version of Word; RTF-J is somewhat different from the standard RTF specification. [25] |
1.6 | May 1999 | Microsoft Word 2000/Word 9 | 1999 | Included Pocket Word and Exchange (used in RTF-HTML conversions). [3] |
1.7 | August 2001 | Microsoft Word 2002/Word 10 | 2001 | 8/2001– Word 2002 RTF Specification [26] [27] |
1.8 | April 2004 | Microsoft Word 2003/Word 11 | 2003 | 10/2003– Word 2003 RTF Specification [4] |
1.9.1 | 19. March 2008 (RTF 1.9 – published in January 2007) [28] | Microsoft Word 2007/Word 12 | 2006 | Allowed XML markup – Custom XML Tags, SmartTags, Math elements in an RTF document, password protection, elements corresponding to Office Open XML Ecma-376 Part 4 [29] |
It is programmed using groups, a backslash, a control word and a delimiter. Groups are contained within curly braces ({}) and indicate which attributes should be applied to certain text.
The backslash (\) introduces a control word, which is a specifically programmed command for RTF. Control words can have certain states in which they are active. These states are represented by numbers. For example,
\b0
\b1
\i0
\i1
\ul0
\ul1
\sub0
\sub1
\superscript0
\superscript1
A delimiter is one of three things:
As an example, the following RTF code
{\rtf1\ansi{\fonttbl\f0\fswiss Helvetica;}\f0\pard This is some {\b bold} text.\par}
would be rendered as follows:
This is some bold text.
A standard RTF file can only consist of 7-bit ASCII characters, but can use escape sequences to encode other characters. [31] The two character escapes are code page escapes and, starting with RTF 1.5, Unicode escapes. In a code page escape, two hexadecimal digits following a backslash and typewriter apostrophe denote a character taken from a Windows code page. For example, if the code page is set to Windows-1256, the sequence \'c8
will encode the Arabic letter bāʼ ب. It is also possible to specify a "Character Set" in the preamble of the RTF document and associate it to a header. For example, the preamble has the text \f3\fnil\fcharset128
, then, in the body of the document, the text \f3\'bd\'f0
will represent the code point 0xbd 0xf0
from the Character Set 128 (which corresponds to the Shift-JIS code page), which encodes "金".
RTF Character Set | Code Page | Description |
---|---|---|
0 | Windows-1252 | Latin alphabet, Western Europe / Americas |
1 | 0 | Default Windows API code page for system locale |
2 | 42 | Symbol (PUA-mapped) [32] character set |
77 | 2 | Default Macintosh-compatibility code page for system locale |
128 | Windows-932 | Japanese, Shift JIS (Windows version) |
129 | Windows-949 | Korean, Unified Hangul Code (extended Wansung) |
130 | Windows-1361 | Korean, Johab (ASCII-based version) |
134 | Windows-936 | Chinese, GBK (extended GB 2312) |
136 | Windows-950 | Chinese, Big5 |
161 | Windows-1253 | Greek |
162 | Windows-1254 | Latin alphabet, Turkish |
163 | Windows-1258 | Latin alphabet, Vietnamese |
177 | Windows-1255 | Hebrew |
178 | Windows-1256 | Arabic |
186 | Windows-1257 | Baltic |
204 | Windows-1251 | Cyrillic |
238 | Windows-1250 | Latin alphabet, Eastern Europe |
255 | 1 | Default OEM code page for system locale |
For a Unicode escape, the control word \u
is used, followed by a 16-bit signed integer which corresponds to the Unicode UTF-16 code unit number. For the benefit of programs without Unicode support, this must be followed by the nearest representation of this character in the specified code page. For example, \u1576?
would give the Arabic letter bāʼ ب, but indicates that older programs which do not support Unicode should render it as a question mark instead.
The control word \uc0
can be used to indicate that subsequent Unicode escape sequences within the current group do not specify the substitution character.
Until RTF specification version 1.5 release in 1997, RTF only handled 7-bit characters directly and 8-bit characters encoded as hexadecimal (using \'xx
). Since RTF 1.5, however, RTF control words generally accept signed 16-bit numbers as arguments. Unicode values greater than 32767 must be expressed as negative numbers. [25] If a Unicode character is outside BMP, it is encoded with a surrogate pair. Support for Unicode was made due to text handling changes in Microsoft Word – Microsoft Word 97 is a partially Unicode-enabled application and it handles text using the 16-bit Unicode character encoding scheme. [25] Microsoft Word 2000 and later versions are Unicode-enabled applications that handle text using the 16-bit Unicode character encoding scheme. [3]
Because RTF files are usually 7-bit ASCII plain text, they can be easily transmitted between PC-based operating systems. Converters that communicate with Microsoft Word for MS Windows or Macintosh generally expect data transfer as 8-bit characters and binary data which can contain any 8-bit values. [29]
RTF is a data format for saving and sharing documents, not a markup language; it is not intended for intuitive and easy typing. [33] [34] Nonetheless, unlike many word processing formats, RTF code can be human-readable. When an RTF file containing mostly Latin characters without diacritics is viewed as a plain text file, the underlying ASCII text is readable, provided that the author has kept formatting concise.
When RTF was released, most word processors used binary file formats; Microsoft Word, for example, used the .DOC file format. RTF was unique in its simple formatting control which allowed non-RTF aware programs like Microsoft Notepad to open and provide readable files. Today, most word processors have moved to XML-based file formats (Word has switched to the .docx file format). Regardless, these files contain large amounts of formatting code, so are often ten or more times larger than the corresponding plain text. [35] [33]
To be standard-compliant RTF, non-ASCII characters must be escaped. Thus, even with concise formatting, text that uses certain dashes and quotation marks is less legible. Latin languages with many diacritics are particularly difficult to read in RTF, as they result in substitutions like \'f1 for ñ and \'e9 for é. Non-Latin scripts are illegible in RTF — \u21563, for example, is used for 吻. From the beginning, RTF has also supported Microsoft OLE embedded objects and Macintosh Edition Manager subscriber objects, which are not human-readable.
Most word processing software support either RTF format importing and exporting for some RTF specification or direct editing, which makes it a "common" format between otherwise incompatible word processing software and operating systems. Most applications that read RTF files silently ignore unknown RTF control words. [36] These factors contribute to its interoperability, though it is still dependent on the specific RTF version in use. [7] There are several consciously designed or accidentally born RTF dialects. [36]
RTF is the internal markup language used by Microsoft Word. [33] Since 1987, RTF files have been able to be transferred back and forth between many old and new computer systems (and now over the Internet), despite differences between operating systems and their versions. This makes it a useful format for basic formatted text documents such as instruction manuals, résumés, letters, and modest information documents. These documents, at minimum, support bold, italic and underline text formatting. Also typically supported are left-, center- and right-aligned text, font specification and document margins.
Font and margin defaults, style presets and other functions vary according to program defaults. There may also be incompatibilities between different RTF versions, e.g. between RTF 1.0 1987 and later specifications, or between RTF 1.0–1.4 and RTF 1.5+ in use of Unicode characters. [37] [38] [39] And though RTF supports metadata like title and author, not all implementations support this. Nevertheless, the RTF format is consistent enough to be considered highly portable and acceptable for cross-platform use.
Microsoft Object Linking and Embedding (OLE) objects and Macintosh Edition Manager subscriber objects allow embedding of other files inside the RTF, such as tables or charts from spreadsheet application. However, since these objects are not widely supported in programs for viewing or editing RTF files, they also limit RTF's interoperability. [40] [41] [42] [43] [44] If software that understands a particular OLE object is not available, the object is displayed using a picture of the object which is embedded along with it. [45] [46]
RTF supports inclusion of JPEG, PNG, Enhanced Metafile (EMF), Windows Metafile (WMF), Apple PICT, Windows device-dependent bitmap, Windows device-independent bitmap and OS/2 Metafile picture types in hexadecimal (the default) or binary format in a RTF file. Not all of these picture types are supported in all RTF readers, however. When a RTF document is opened in software that does not support the picture type of an inserted picture, the picture is not displayed. RTF writers usually either convert an inserted picture in an unsupported picture type to one in a supported picture type, or do not include picture at all.
For better compatibility with Microsoft products, some RTF writers include the same picture in two different picture types in one RTF file: one supported picture type to display, and one uncompressed WMF copy of the original picture to improve compatibility with some Microsoft applications like Wordpad. [47]
This method increases the RTF file size dramatically. The RTF specification does not require this method, and several implementations do not include the WMF copy (e.g. Abiword or Ted).
For Microsoft Word, it is also possible to set a specific registry value ("ExportPictureWithMetafile=0") to prevent Word from saving the WMF copy. [47]
RTF supports embedding of fonts used in the document, but this feature is not widely supported in software implementations. [48] [49] [50]
RTF also supports generic font family names used for font substitution: roman (serif), Swiss (sans-serif), modern (monospace), script, decorative and technical. [19] This feature is not widely supported either.
Since RTF 1.0, the RTF specification has supported document annotations/comments. [19] The RTF 1.7 specification defined some new features for annotations, including the date stamp (there was previously only "time stamp") and parents of annotations. [27] When a RTF document with annotations is opened in an application that does not support RTF annotations, the annotations are not shown. Similarly, when a document with annotations is saved as RTF in an application that does not support RTF annotations, the annotations are not preserved in the RTF file. Some implementations, like Abiword (since version 2.8) and IBM Lotus Symphony (up to version 1.3), may hide annotations by default or require some user action to display them.
The RTF specification also supports footnotes, which are widely supported in RTF implementations (e.g. in OpenOffice.org, Abiword, KWord, Ted, but not in Wordpad). Endnotes are implemented as a variation on footnotes, so applications that support footnotes but not endnotes will render an endnote as a footnote.
Microsoft products do not support comments within footers, footnotes or headers. Similarly, Microsoft products do not support footnotes in headers, footers, or comments. Inserting a comment or a footnote in one of these disallowed contexts may result in a corrupted document. [29]
The RTF 1.2 specification defined use of drawing objects, known as shapes, such as rectangles, ellipses, lines, arrows and polygons. The RTF 1.5 specification introduced many new control words for drawing objects. [25]
However, many RTF implementations, such as Apache OpenOffice, do not support drawing objects [51] (though they are supported in LibreOffice 4.0 on [52] ) or Abiword. [53] Applications which do not support RTF drawing objects do not display or save the shapes. Some implementations will also not display any text inside drawing objects. [54] [55]
Unlike Microsoft Word's DOC format, as well as the newer Office Open XML and OpenDocument formats, RTF does not support macros. For this reason, RTF was often recommended over those formats when the spread of computer viruses through macros was a concern. However, having the .RTF extension does not guarantee a safe file, since Microsoft Word will open standard DOC files renamed with an RTF extension and run any contained macros as usual. Manual examination of a file in a plain text editor such as Notepad, or use of the file
command in a UNIX-like systems, is required to determine whether or not a suspect file is really RTF. [9] [56] Enabling Word's "Confirm file format conversion on open" option can also assist by warning a document being opened is in a format that does not match the format implied by the file's extension, and giving the option to abort opening that file. One exploit attacking a vulnerability was patched in Microsoft Word in April 2015. [57]
Since 2014 there have been malware RTF files embedding OpenXML exploits. [58]
Each RTF implementation usually implements only some versions or subsets of the RTF specification. [7] Many of the available RTF converters cannot understand all new features in the latest RTF specifications. [37] [59]
The WordPad editor in Microsoft Windows creates RTF files by default. It once defaulted to the Microsoft Word 6.0 file format, but write support for Word documents (.doc) was dropped in a security update. Read support was also dropped in Windows 7. WordPad does not support some RTF features, such as headers and footers. [60] However, WordPad can read and save many RTF features that it cannot create, including tables, strikeout, superscript, subscript, "extra" colors, text background colors, numbered lists, right or left indent, quasi-hypertext and URL linking, and various line spacings. RTF is also the data format for "rich text controls" in MS Windows APIs. [33]
The default text editor for macOS, TextEdit, can also view, edit and save RTF files as well as RTFD files, and uses the format as its default. As of July 2009, TextEdit has limited ability to edit RTF document margins. Much older Mac word processing application programs such as MacWrite and WriteNow had the same RTF abilities as TextEdit has.
The following free and open-source word processors attempt to work with Microsoft's RTF file format, see the criticism paragraph below. AbiWord, Apache OpenOffice, Bean, Calligra, Collabora Online and LibreOffice.
Scrivener uses individual RTF files for all the text files that make up a given "project".
SIL International's freeware application for developing and publishing dictionaries uses RTF as its most common form of document output. RTF files produced by Toolbox are designed to be used in Microsoft Word, but can also be used by other RTF-aware word processors.
RTF can be used on some ebook readers because of its interoperability, [61] simplicity and low CPU processing requirements.
The open-source script rtf2xml can partially convert RTF to XML. [62] [63]
GNU UnRTF is an open-source program to convert RTF into HTML, LaTeX, troff macros and other formats. pyth is a Python library to create and convert documents in RTF, XHTML and PDF format. Ruby RTF is a project to create Rich Text content via Ruby. RaTFink is a library of Tcl routines, free software, to generate RTF output, and a Cost script to convert SGML to RTF. RTF::Writer is a Perl module for generating RTF documents. PHPRtfLite is an API enabling developers to create RTF documents with PHP. Pandoc is an open source document converter with multiple output formats, including RTF. RTFGen is a project to create RTF documents via pure PHP. rtf.js is a JavaScript based library to render RTF documents in HTML.
The macOS command line tool textutil can convert files between rtf, rtfd, text, doc, docx, wordml, odt and webarchive formats. [64] The editor Ted can also convert RTF files to HTML and PS format.
The Rich Text Format was the standard file format for text-based documents in applications developed for Microsoft Windows. Microsoft did not initially make the RTF specification publicly available, making it difficult for competitors to develop document conversion features in their applications. Because Microsoft's developers had access to the specification, Microsoft's applications had better compatibility with the format. Also, each time Microsoft changed the RTF specification, Microsoft's own applications had a lead in time-to-market, because competitors had to redevelop their applications after studying the newer version of the format.
Novell alleged that Microsoft's practices were anticompetitive in its 2004 antitrust complaint against Microsoft. [65] [66]
AbiWord is a free and open-source word processor. It is written in C++ and since version 3 it is based on GTK+ 3. The name "AbiWord" is derived from the root of the Spanish word "abierto", meaning "open".
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder John Warnock in 1991. PDF was standardized as ISO 32000 in 2008. The last edition as ISO 32000-2:2020 was published in December 2020.
UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation Format – 8-bit. Almost every web page is stored in UTF-8.
Adobe InDesign is a desktop publishing and page layout designing software application produced by Adobe and first released in 1999. It can be used to create works such as posters, flyers, brochures, magazines, newspapers, presentations, books and ebooks. InDesign can also publish content suitable for tablet devices in conjunction with Adobe Digital Publishing Suite. Graphic designers and production artists are the principal users.
OpenType is a format for scalable computer fonts. Derived from TrueType, it retains TrueType's basic structure but adds many intricate data structures for describing typographic behavior. OpenType is a registered trademark of Microsoft Corporation.
A document file format is a text or binary file format for storing documents on a storage media, especially for use by computers. There currently exist a multitude of incompatible document file formats.
WordPad is a word processor included with Windows 95 and later. Similarly to its predecessor Microsoft Write, it is a basic word processor, positioned as more advanced than the Notepad text editor by supporting rich text editing, but with a subset of the functionality of Microsoft Word.
Microsoft Office 2007 is an office suite for Windows, developed and published by Microsoft. It was officially revealed on March 9, 2006 and was the 12th version of Microsoft Office. It was released to manufacturing on November 3, 2006; it was subsequently made available to volume license customers on November 30, 2006, and later to retail on January 30, 2007. The Mac OS X equivalent, Microsoft Office 2008 for Mac, was released on January 15, 2008.
Data conversion is the conversion of computer data from one format to another. Throughout a computer environment, data is encoded in a variety of ways. For example, computer hardware is built on the basis of certain standards, which requires that data contains, for example, parity bit checks. Similarly, the operating system is predicated on certain standards for data and file handling. Furthermore, each computer program handles data in a different manner. Whenever any one of these variables is changed, data must be converted in some way before it can be used by a different computer, operating system or program. Even different versions of these elements usually involve different data structures. For example, the changing of bits from one format to another, usually for the purpose of application interoperability or of the capability of using new features, is merely a data conversion. Data conversions may be as simple as the conversion of a text file from one character encoding system to another; or more complex, such as the conversion of office file formats, or the conversion of image formats and audio file formats.
Office Open XML is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Ecma International standardized the initial version as ECMA-376. ISO and IEC standardized later versions as ISO/IEC 29500.
This article describes the technical specifications of the OpenDocument office document standard, as developed by the OASIS industry consortium. A variety of organizations developed the standard publicly and make it publicly accessible, meaning it can be implemented by anyone without restriction. The OpenDocument format aims to provide an open alternative to proprietary document formats.
Rich Text Format Directory, also known as RTFD, or Rich Text Format with Attachments, is a primary document format of TextEdit, an application native to NeXTSTEP and macOS which has also been ported to other versions of Unix. The file format is based on the Rich Text Format, but can also include "attachments" such as images and animations.
A proprietary file format is a file format of a company, organization, or individual that contains data that is ordered and stored according to a particular encoding-scheme, designed by the company or organization to be secret, such that the decoding and interpretation of this stored data is easily accomplished only with particular software or hardware that the company itself has developed. The specification of the data encoding format is not released, or underlies non-disclosure agreements. A proprietary format can also be a file format whose encoding is in fact published, but is restricted through licences such that only the company itself or licensees may use it. In contrast, a open or free format is a file format that is published and free to be used by everybody.
Siag Office is a tightly integrated free software office package for Unix-like operating systems. It consists of the spreadsheet SIAG, the word processor Pathetic Writer (PW), the animation program Egon Animator, the text editor XedPlus, the file manager Xfiler and the previewer Gvu.
The following is a comparison of e-book formats used to create and publish e-books.
Microsoft Word Viewer is a discontinued freeware program for Microsoft Windows that can display and print Microsoft Word documents. Word Viewer allows text from a Word document to be copied into clipboard and pasted into a word processor. The last version was Word Viewer 2003 Service Pack 3 released in 2007.
Font embedding is the inclusion of font files inside an electronic document for display across different platforms. Font embedding is controversial because it allows licensed fonts to be freely distributed.
Recoll is a desktop search tool that provides full-text search in a GUI with a few mandatory external dependencies. It runs on many Unix-like operating systems and is mostly independent of the desktop environment. Recoll has been ported to OS/2, and is planned for integration into the OS/2-based ArcaOS.
Like ".doc," RTF is a proprietary file format, but it has been designed to be more widely compatible and transferable.
There are many different revisions of Microsoft's proprietary Rich Text Format and portability of files will depend on what version of RTF is being used.
There are many different revisions of Microsoft's proprietary Rich Text Format and portability of files will depend on what version of RTF is being used.