Document comparison

Last updated

Document comparison, also known as redlining or blacklining, is a computer process by which changes are identified between two versions of the same document for the purposes of document editing and review. Document comparison is a common task in the legal and financial industries.

Contents

The software-based document comparison process compares a reference document to a target document, and produces a third document which indicates (by colored highlighting or by differing font characteristics) information (text, graphics, formulas, etc.) that has either been added to or removed from the reference document to produce the target document.

Common documents formats for comparison include word processing documents (e.g. Microsoft Word), spreadsheets, presentations (e.g. PowerPoint), and Portable Document Format (PDF) documents.

Overview

In the broadest definition, document comparison can refer to any act of marking changes made between two versions of the same document and presenting those changes in a third document via a graphical user interface (GUI). There are several variants in the types of changes registered through the process of document comparison. Some programs limit comparison to solely text and table content in word processing documents, while others register changes made in spreadsheets and presentations, along with changes made in versions of PDF documents. Certain programs also exist that compare changes made to objects like JPEG, TIFF, BMP, PNG images embedded in documents, and plain text files.


Document comparison solutions mark changes made to the following types of documents:

Word processing documentsText in paragraphs and in text boxes; bullets and numbering; tables of content; applied styles; design and layout elements; tables, including additions and deletions of rows; embedded objects  ; inserted images.
SpreadsheetsValues; formulas; additions and deletions to rows and columns, applied styles; design and layout elements.
PresentationsText, table, image and other changes.
PDF DocumentsText, table, image and other changes.

It is common for document comparison software vendors to present forms of the compared document in separate windows in a GUI. Each window contains the following items and the various windows are displayed on one or more computer display monitors:

Presentation of changes made between document versions are also traditionally customizable. While one standard display of showing deletions with red underlines and additions with blue underlines is still used by many document comparison products, some programs allow users to customize the presentation of changes in the redline/comparison document. U.S. contract lawyers typically show deletions as red strikethrough text (red text with a line crossing off the words being deleted) and additions with red underlines.

History

Prior to personal computers, document comparison entailed the printing of two versions of a single document and reviewing those hard copies in detail for changes and version amendment. Included in this process were the potential for human error and the expansive administrative time necessitated by this arduous process. A ruler was used with a red pen to draw strike-through lines of deleted text and double-underline inserted text. The term "redline" came from using a red pen on the original/current version. When the document was placed in a copy machine, the copies came out black, thus the term "blackline." [1]

With the advent of personal computers and the ubiquity of word processing software, the need arose to find a way to manage changes made to document versions shared via disk, and later email. The importance of mitigating risks associated with potential document changes became essential as the amount of document and revision sharing increased. Early document comparison software solutions provided robust document review, checking all the text in two documents for changes, and then presenting those changes in a third redline/comparison version.[ citation needed ]

As documents changed and evolved, so did document comparison solutions. Software began utilizing tables to manage a multiplicity of document layouts. Many document comparison solutions had difficulty comparing tables in document versions. These solutions first converted tables to text arrays and then compared the created arrays. In many cases, not enough due diligence on the software’s part was conducted; users would not be informed of sections that were not successfully compared. In the second generation, Microsoft’s Track Changes option was also introduced. With Track Changes, all changes made to documents were captured and stored inside the document. Flaws in the functionality of Track Changes could render the documents unusable and some comparison offerings again had difficulty managing the complex process of comparing in a Track Changes environment.

Before third-generation technology, it was common for organizations to be required to use multiple documents for one product. A main document with various supporting documents would be used to present and share necessary information. However, later software (especially Microsoft Word) enabled multiple types of information to be presented in a single document. Compound documents could include text, tables, and various styles, and could also include a range of embedded objects, such as Excel, Visio, ChemDraw, and SmartDraw objects, and inserted images in a range of types (including jpg, tiff, bmp, and gif). While this enhancement greatly increased the usefulness of documents, it added an entirely new layer of risk to organizations that needed to fully understand changes made to document versions. The majority of document comparison software programs have not yet included mechanisms to mitigate the risk related to changes inside of embedded objects. The software program that can compare changes made in embedded objects provides pixel-to-pixel comparison of images and cell-level comparison of embedded Excel spreadsheets and other changes made to these complex, compound documents. In the 2020s, virtual assistants like ChatGPT have offered a use case for document comparison tasks [2] .

Business relevance

Document comparison provides a method of quality assurance. Individuals and organizations are able to assure that changes requested have been integrated properly into documents. Additionally, document comparison provides assurances that no unwarranted changes were made.

Lawyers and legal professionals regularly share documents with opposing counsel. As the documents constructed in this business vertical may be binding on either side's clients, it is essential that the risks associated with changes are completely mitigated. If opposing counsel makes a change that is not detected by the lawyer, such a change could negatively affect the lawyer's client and the lawyer could be liable for the damages.

Document comparison in banking, finance and accounting

Professionals in the banking, finance and accounting industries manage large amounts of data in spreadsheets. As one change to a value or formula could affect a substantial amount of data, these professionals find document comparison (such as comparison of two versions of a MS Excel spreadsheet) to be extremely useful in assuring accuracy in document change management.

Creative media management and publishing

Professionals in these industries regularly work with multiple versions of single documents. Document comparison software helps these professionals ensure that all changes have been acceptably integrated into latest versions and provides them with a speedy understanding of changes made in editing and versioning of the documents they work with.

See also

Related Research Articles

The Rich Text Format is a proprietary document file format with published specification developed by Microsoft Corporation from 1987 until 2008 for cross-platform document interchange with Microsoft products. Prior to 2008, Microsoft published updated specifications for RTF with major revisions of Microsoft Word and Office versions.

<span class="mw-page-title-main">Spreadsheet</span> Computer application for organization, analysis, and storage of data in tabular form

A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in cells of a table. Each cell may contain either numeric or text data, or the results of formulas that automatically calculate and display a value based on the contents of other cells. The term spreadsheet may also refer to one such electronic document.

<span class="mw-page-title-main">WordPerfect</span> Word processing application

WordPerfect (WP) is a word processing application, now owned by Alludo, with a long history on multiple personal computer platforms. At the height of its popularity in the 1980s and early 1990s, it was the market leader of word processors, displacing the prior market leader WordStar.

Lotus Symphony was an integrated software package for creating and editing text, spreadsheets, charts and other documents on the MS-DOS operating systems. It was released by Lotus Development as a follow-on to its popular spreadsheet program, Lotus 1-2-3, and was produced from 1984 to 1992. Lotus Jazz on the Apple Macintosh was a sibling product.

Object Linking and Embedding (OLE) is a proprietary technology developed by Microsoft that allows embedding and linking to documents and other objects. For developers, it brought OLE Control Extension (OCX), a way to develop and use custom user interface elements. On a technical level, an OLE object is any object that implements the IOleObject interface, possibly along with a wide range of other interfaces, depending on the object's needs.

<span class="mw-page-title-main">GNU TeXmacs</span> Open-source word processor

GNU TeXmacs is a scientific word processor and typesetting component of the GNU Project. It originated as a variant of GNU Emacs with TeX functionalities, though it shares no code with those programs, while using TeX fonts. It is written and maintained by Joris van der Hoeven and a group of developers. The program produces structured documents with a WYSIWYG user interface. New document styles can be created by the user. The editor provides high-quality typesetting algorithms and TeX and other fonts for publishing professional looking documents.

<span class="mw-page-title-main">AppleWorks</span> Office software suite from Apple

AppleWorks was an integrated office suite containing a word processor, database, and spreadsheet. It was developed by Rupert Lissner for Apple Computer, originally for the Apple II and launched in 1984. Many enhancements for AppleWorks were created, the most popular being the TimeOut series from Beagle Bros which extended the life of the Apple II version of AppleWorks. Appleworks was later reworked for the Macintosh platform.

<span class="mw-page-title-main">NeoOffice</span> macOS office suite

NeoOffice was an office suite for the macOS operating system developed by Planamesa Inc. It was a commercial fork of the free and open source LibreOffice office suite, including a word processor, spreadsheet, presentation program, and graphics program. It added some features not present in the macOS versions of LibreOffice and Apache OpenOffice. The last few versions were based on LibreOffice 4.4, which was released mid-2014.

iWork Office suite of applications created by Apple Inc.

iWork is an office suite of applications created by Apple for its macOS, iPadOS, and iOS operating systems, and also available cross-platform through the iCloud website.

PolyEdit is a compact multipurpose word processor and text editor for Microsoft Windows. It has been developed by PolySoft Solutions since 1998.

<span class="mw-page-title-main">File comparison</span>

Editing documents, program code, or any data always risks introducing errors. Displaying the differences between two or more sets of data, file comparison tools can make computing simpler, and more efficient by focusing on new data and ignoring what did not change. Generically known as a diff after the Unix diff utility, there are a range of ways to compare data sources and display the results.

<span class="mw-page-title-main">Siag Office</span> Extremely lightweight office suite

Siag Office is a tightly integrated free software office package for Unix-like operating systems. It consists of the spreadsheet SIAG, the word processor Pathetic Writer (PW), the animation program Egon Animator, the text editor XedPlus, the file manager Xfiler and the previewer Gvu.

The Microsoft Office XML formats are XML-based document formats introduced in versions of Microsoft Office prior to Office 2007. Microsoft Office XP introduced a new XML format for storing Excel spreadsheets and Office 2003 added an XML-based format for Word documents.

<span class="mw-page-title-main">Numbers (spreadsheet)</span> Spreadsheet application by Apple Inc.

Numbers is a spreadsheet application developed by Apple Inc. as part of the iWork productivity suite alongside Keynote and Pages. Numbers is available for iOS and macOS High Sierra or newer. Numbers 1.0 on Mac OS X was announced on August 7, 2007, making it the newest application in the iWork suite. The iPad version was released on January 27, 2010. The app was later updated to support iPhone and iPod Touch.

The Office Open XML file formats are a set of file formats that can be used to represent electronic office documents. There are formats for word processing documents, spreadsheets and presentations as well as specific formats for material such as mathematical formulas, graphics, bibliographies etc.

Productivity software is application software used for producing information. Its names arose from it increasing productivity, especially of individual office workers, from typists to knowledge workers, although its scope is now wider than that. Office suites, which brought word processing, spreadsheet, and relational database programs to the desktop in the 1980s, are the core example of productivity software. They revolutionized the office with the magnitude of the productivity increase they brought as compared with the pre-1980s office environments of typewriters, paper filing, and handwritten lists and ledgers. In the United States, some 78% of "middle-skill" occupations now require the use of productivity software. In the 2010s, productivity software has become even more consumerized than it already was, as computing becomes ever more integrated into daily personal life.

OnlyOffice, stylized as ONLYOFFICE, is a free software office suite and ecosystem of collaborative applications. It consists of online editors for text documents, spreadsheets, presentations, forms and PDFs, and the room-based collaborative platform.

Microsoft Office shared tools are software components that are included in all Microsoft Office products.

<span class="mw-page-title-main">Collabora Online</span> Online office suite based on LibreOffice

Collabora Online is an open source online office suite built on LibreOffice technology, enabling web-based collaborative real-time editing of word processing documents, spreadsheets, presentations, and vector graphics. Optional apps are available for desktops, laptops, tablets, smartphones, and Chromebooks.

References

  1. Diane McCullough - Former Legal Secretary, Personal Experience
  2. https://www.evolution.ai/post/use-chatgpt-to-compare-documents

Document comparison tool https://draftable.com/