Data Interchange Format

Last updated

Data Interchange Format (.dif) is a text file format used to import/export single spreadsheets between spreadsheet programs.

Contents

Applications that still support the DIF format are Collabora Online, Excel, [note 1] Gnumeric, and LibreOffice Calc. Historical applications that used to support it until they became end of life or no longer acknowledge support of the format are dBase, FileMaker, Framework, Lotus 1-2-3, Multiplan, OpenOffice.org Calc and StarCalc. [1] [2]

A limitation with DIF format is that it cannot handle multiple spreadsheets in a single workbook. Due to the similarity in abbreviation and in age (both date to the early 1980s), the DIF spreadsheet format it is often confused with Navy DIF; Navy DIF, however, is an unrelated "document interchange format" for word processors. [3]

History

DIF was developed by Software Arts, Inc. (the developers of the VisiCalc program) in the early 1980s. The specification was included in many copies of VisiCalc, and published in Byte Magazine. Bob Frankston developed the format, with input from others, including Mitch Kapor, who helped so that it could work with his VisiPlot program. (Kapor later went on to found Lotus and make Lotus 1-2-3 happen.) The specification was copyright 1981.

DIF was a registered trademark of Software Arts Products Corp. (a legal name for Software Arts at the time).

Syntax

DIF stores everything in an ASCII text file to mitigate many cross-platform issues back in the days of its creation. However modern spreadsheet software, e.g. OpenOffice.org Calc and Gnumeric, offer more character encoding to export/import. The file is divided into 2 sections: header and data. Everything in DIF is represented by a 2- or 3-line chunk. Headers get a 3-line chunk; data, 2. Header chunks start with a text identifier that is all caps, only alphabetic characters, and less than 32 letters. The following line must be a pair of numbers, and the third line must be a quoted string. On the other hand, data chunks start with a number pair and the next line is a quoted string or a keyword.

Values

A value occupies two lines, the first a pair of numbers and the second either a string or a keyword. The first number of the pair indicates type:

Header chunk

A header chunk is composed of an identifier line followed by the two lines of a value.

The numeric values in header chunks use just an empty string instead of the validity keywords.

Discrepancies in implementations

Some implementations (notably those of older Microsoft products) swapped the meaning of VECTORS and TUPLES. Some implementations are insensitive to errors in the dimensions of the table as written in the header and simply use the layout in the DATA section.

Example

For example, assume we have two columns with one column header row and two data rows:

TextNumber
hello1
has a double quote " in text-3

In a .dif file, this would be (→ indicates comments):

TABLE 0,1 "EXCEL" VECTORS     → the number of columns follows as a numeric value 0,2         → '0' indicates that it's a numeric type, '2' since we have 2 columns "" TUPLES      → the number of rows follows as a numeric value 0,3         → '0' indicates that it's a numeric type, '3' since we have 3 rows "" DATA        → after a dummy 0 numeric value, the data for the table follow 0,0         → this is the dummy 0 numeric value "" -1,0        → '-1' for the directive type. This is followed by either a 'BOT' or an 'EOD' BOT         → signifies the start of a row 1,0         → '1' since the cell contains a string. (The second number is ignored) "Text"      → this is the String that's in the cell 1,0         → '1' since the cell contains a string. "Number"  -1,0   BOT         → another row  1,0         → a string follows "hello" 0,1         → numeric value ('0') of value '1' V           → 'V' is for 'Valid' -1,0  BOT         → another row 1,0 "has a double quote "" in text" 0,-3 V -1,0  EOD         → End of Data 

See also

Notes

  1. Microsoft Excel's implementation caused interoperability problems, see § Discrepancies in implementations.

Related Research Articles

<span class="mw-page-title-main">Microsoft Excel</span> Spreadsheet editor, part of Microsoft Office

Microsoft Excel is a spreadsheet developed by Microsoft for Windows, macOS, Android, iOS and iPadOS. It features calculation or computation capabilities, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications (VBA). Excel forms part of the Microsoft 365 suite of software.

<span class="mw-page-title-main">PNG</span> Family of lossless compression file formats for image files

Portable Network Graphics is a raster-graphics file format that supports lossless data compression. PNG was developed as an improved, non-patented replacement for Graphics Interchange Format (GIF)—unofficially, the initials PNG stood for the recursive acronym "PNG's not GIF".

PCX, standing for PiCture eXchange, was an image file format developed by the now-defunct ZSoft Corporation of Marietta, Georgia, United States. It was the native file format for PC Paintbrush and became one of the first widely accepted DOS imaging standards, although it has since been succeeded by more sophisticated image formats, such as BMP, JPEG, and PNG. PCX files commonly stored palette-indexed images ranging from 2 or 4 colors to 16 and 256 colors, although the format has been extended to record true-color (24-bit) images as well.

<span class="mw-page-title-main">Spreadsheet</span> Computer application for organization, analysis, and storage of data in tabular form

A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in cells of a table. Each cell may contain either numeric or text data, or the results of formulas that automatically calculate and display a value based on the contents of other cells. The term spreadsheet may also refer to one such electronic document.

<span class="mw-page-title-main">VisiCalc</span> Computer application

VisiCalc is the first spreadsheet computer program for personal computers, originally released for Apple II by VisiCorp on October 17, 1979. It is considered the killer application for the Apple II, turning the microcomputer from a hobby for computer enthusiasts into a serious business tool, and then prompting IBM to introduce the IBM PC two years later. More than 700,000 copies were sold in six years, and up to 1 million copies over its history.

Interchange File Format (IFF) is a generic digital container file format originally introduced by Electronic Arts in 1985 to facilitate transfer of data between software produced by different companies.

<span class="mw-page-title-main">Lotus Improv</span> Spreadsheet program

Lotus Improv is a discontinued spreadsheet program from Lotus Development released in 1991 for the NeXTSTEP platform and then for Windows 3.1 in 1993. Development was put on hiatus in 1994 after slow sales on the Windows platform, and officially ended in April 1996 after Lotus was purchased by IBM.

Resource Interchange File Format (RIFF) is a generic file container format for storing data in tagged chunks. It is primarily used for audio and video, though it can be used for arbitrary data.

<span class="mw-page-title-main">Table (information)</span> Arrangement of information or data, typically in rows and columns

A table is an arrangement of information or data, typically in rows and columns, or possibly in a more complex structure. Tables are widely used in communication, research, and data analysis. Tables appear in print media, handwritten notes, computer software, architectural ornamentation, traffic signs, and many other places. The precise conventions and terminology for describing tables vary depending on the context. Further, tables differ significantly in variety, structure, flexibility, notation, representation and use. Information or data conveyed in table form is said to be in tabular format. In books and technical articles, tables are typically presented apart from the main text in numbered and captioned floating blocks.

A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separator is the source of the name for this file format. A CSV file typically stores tabular data in plain text, in which case each line will have the same number of fields.

<span class="mw-page-title-main">Flat-file database</span> Database stored as an ordinary unstructured file

A flat-file database is a database stored in a file called a flat file. Records follow a uniform format, and there are no structures for indexing or recognizing relationships between records. The file is simple. A flat file can be a plain text file, or a binary file. Relationships can be inferred from the data in the database, but the database format itself does not make those relationships explicit.

TK Solver is a mathematical modeling and problem solving software system based on a declarative, rule-based language, commercialized by Universal Technical Systems, Inc.

Netpbm is an open-source package of graphics programs and a programming library. It is used mainly in the Unix world, where one can find it included in all major open-source operating system distributions, but also works on Microsoft Windows, macOS, and other operating systems.

A pivot table is a table of grouped values that aggregates the individual items of a more extensive table within one or more discrete categories. This summary might include sums, averages, or other statistics, which the pivot table groups together using a chosen aggregation function applied to the grouped values.

Symbolic Link (SYLK) is a Microsoft file format typically used to exchange data between applications, specifically spreadsheets. SYLK files conventionally have a .slk suffix. Composed of only displayable ANSI characters, it can be easily created and processed by other applications, such as databases.

PLY is a computer file format known as the Polygon File Format or the Stanford Triangle Format. It was principally designed to store three-dimensional data from 3D scanners. The data storage format supports a relatively simple description of a single object as a list of nominally flat polygons. A variety of properties can be stored, including color and transparency, surface normals, texture coordinates and data confidence values. The format permits one to have different properties for the front and back of a polygon. There are two versions of the file format, one in ASCII, the other in binary.

A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free.

OptimJ is an extension for Java with language support for writing optimization models and abstractions for bulk data processing. The extensions and the proprietary product implementing the extensions were developed by Ateji which went out of business in September 2011. OptimJ aims at providing a clear and concise algebraic notation for optimization modeling, removing compatibility barriers between optimization modeling and application programming tools, and bringing software engineering techniques such as object-orientation and modern IDE support to optimization experts.

Universal Binary JSON (UBJSON) is a computer data interchange format. It is a binary form directly imitating JSON, but requiring fewer bytes of data. It aims to achieve the generality of JSON, combined with being much easier to process than JSON.

References

  1. "LibreOffice Calc – Supported File Formats". LibreOfficeHelp.com. 2020-10-06. Archived from the original on 2016-12-13. Retrieved 2020-09-08.
  2. "File formats that are supported in Excel". support.microsoft.com. Archived from the original on 2020-11-11. Retrieved 2021-09-08.
  3. Petrosky, Mary (August 5, 1985). "File Conversion Market Grows". InfoWorld . Vol. 7, no. 31. pp. 36–37. "Among the file formats designed to facilitate the interchange of text files between microcomputers running different word processing software, IBM's Document Content Architecture (DCA) and the U.S. Navy's document interchange format (DIF) seem to have the greatest support."

Sources