Computer-assisted reviewing

Last updated

Computer-assisted reviewing (CAR) tools are pieces of software based on text-comparison and analysis algorithms. [1] These tools focus on the differences between two documents, taking into account each document's typeface through an intelligent analysis.

Contents

Detecting differences

The intelligent analysis used by CAR tools detect the differences do not have the same value depending on their type and/or the document field/subject. For example, a difference on a number is not the same if this number is a date, a price, a page number, a figure number, a part of an address, a footnote call, a list item number, a title number, etc.

These tools are interesting in various kind of applications:

For translation

Computer assisted reviewing for translation (CART) tools are CAR tools being able to manage multi-lingual comparisons. This implies to be able to match each part of text from one document to the other, taking into account the specificity of each language: date/number formats, punctuation (for example, French/English quotation marks), etc. The best CART tools are able to find matches between noun or verbal groups, this implying to find terminological and syntactical elements using linguistic analyzers.

Application examples

See also

Related Research Articles

<span class="mw-page-title-main">LaTeX</span> Document preparation software system

LaTeX is a software system for typesetting documents. LaTeX markup describes the content and layout of the document, as opposed to the formatted text found in WYSIWYG word processors like Microsoft Word, LibreOffice Writer and Apple Pages. The writer uses markup tagging conventions to define the general structure of a document, to stylise text throughout a document, and to add citations and cross-references. A TeX distribution such as TeX Live or MiKTeX is used to produce an output file suitable for printing or digital distribution.

<span class="mw-page-title-main">Markup language</span> Modern system for annotating a document

A markuplanguage is a text-encoding system which specifies the structure and formatting of a document and potentially the relationship between its parts. Markup can control the display of a document or enrich its content to facilitate automated processing.

The Rich Text Format is a proprietary document file format with published specification developed by Microsoft Corporation from 1987 until 2008 for cross-platform document interchange with Microsoft products. Prior to 2008, Microsoft published updated specifications for RTF with major revisions of Microsoft Word and Office versions.

In computing, WYSIWYG, an acronym for What You See Is What You Get, refers to software which allows content to be edited in a form that resembles its appearance when printed or displayed as a finished product, such as a printed document, web page, or slide presentation. WYSIWYG implies a user interface that allows the user to view something very similar to the result while the document is being created. In general, WYSIWYG implies the ability to directly manipulate the layout of a document without having to type or remember names of layout commands.

DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of documentation.

<span class="mw-page-title-main">Optical character recognition</span> Computer recognition of visual text

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.

A translation memory (TM) is a database that stores "segments", which can be sentences, paragraphs or sentence-like units that have previously been translated, in order to aid human translators. The translation memory stores the source text and its corresponding translation in language pairs called “translation units”. Individual words are handled by terminology bases and are not within the domain of TM.

<span class="mw-page-title-main">TextEdit</span> Open-source word processor and text editor

TextEdit is an open-source word processor and text editor, first featured in NeXT's NeXTSTEP and OPENSTEP. It is now distributed with macOS since Apple Inc.'s acquisition of NeXT, and available as a GNUstep application for other Unix-like operating systems such as Linux. It is powered by Apple Advanced Typography.

<span class="mw-page-title-main">Typesetting</span> Composition of text by means of arranging physical types or digital equivalents

Typesetting is the composition of text for publication, display, or distribution by means of arranging physical type in mechanical systems or glyphs in digital systems representing characters. Stored types are retrieved and ordered according to a language's orthography for visual display. Typesetting requires one or more fonts. One significant effect of typesetting was that authorship of works could be spotted more easily, making it difficult for copiers who have not gained permission.

<span class="mw-page-title-main">Parallel text</span> Text placed alongside its translation or translations

A parallel text is a text placed alongside its translation or translations. Parallel text alignment is the identification of the corresponding sentences in both halves of the parallel text. The Loeb Classical Library and the Clay Sanskrit Library are two examples of dual-language series of texts. Reference Bibles may contain the original languages and a translation, or several translations by themselves, for ease of comparison and study; Origen's Hexapla placed six versions of the Old Testament side by side. A famous example is the Rosetta Stone, whose discovery allowed the Ancient Egyptian language to begin being deciphered.

A programming tool or software development tool is a computer program that software developers use to create, debug, maintain, or otherwise support other programs and applications. The term usually refers to relatively simple programs, that can be combined to accomplish a task, much as one might use multiple hands to fix a physical object. The most basic tools are a source code editor and a compiler or interpreter, which are used ubiquitously and continuously. Other tools are used more or less depending on the language, development methodology, and individual engineer, often used for a discrete task, like a debugger or profiler. Tools may be discrete programs, executed separately – often from the command line – or may be parts of a single large program, called an integrated development environment (IDE). In many cases, particularly for simpler use, simple ad hoc techniques are used instead of a tool, such as print debugging instead of using a debugger, manual timing instead of a profiler, or tracking bugs in a text file or spreadsheet instead of a bug tracking system.

Computer-aided translation (CAT), also referred to as computer-assisted translation or computer-aided human translation (CAHT), is the use of software, also known as a translator, to assist a human translator in the translation process. The translation is created by a human, and certain aspects of the process are facilitated by software; this is in contrast with machine translation (MT), in which the translation is created by a computer, optionally with some human intervention.

Computer-assisted audit tool (CAATs) or computer-assisted audit tools and techniques (CAATTs) is a growing field within the IT audit profession. CAATs is the practice of using computers to automate the IT audit processes. CAATs normally include using basic office productivity software such as spreadsheets, word processors and text editing programs and more advanced software packages involving use statistical analysis and business intelligence tools. But also more dedicated specialized software are available.

<span class="mw-page-title-main">OmegaT</span> Computer assisted translation tool written in Java

OmegaT is a computer-assisted translation tool written in the Java programming language. It is free software originally developed by Keith Godfrey in 2000, and is currently developed by a team led by Aaron Madlon-Kay.

Search engine indexing is the collecting, parsing, and storing of data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science. An alternate name for the process, in the context of search engines designed to find web pages on the Internet, is web indexing.

Open Language Tools is a Java project released by Sun Microsystems under the terms of Sun's CDDL.

The following outline is provided as an overview of and topical guide to software:

The name MetaTexis is used for several software products developed by MetaTexis Software and Services. The main software products are MetaTexis for Word and the MetaTexis Server. MetaTexis for Word is a translation memory software, also called a Computer-assisted translation tool, that runs inside Microsoft Word. The MetaTexis Server is a server software for translation memories (TMs) and terminology databases (TDBs) that allows numerous translators to work with the same TMs and TDBs via LAN or Internet.

The following outline is provided as an overview of and topical guide to natural-language processing:

memoQ is a proprietary computer-assisted translation software suite which runs on Microsoft Windows operating systems. It is developed by the Hungarian software company memoQ Fordítástechnológiai Zrt., formerly Kilgray, a provider of translation management software established in 2004 and cited as one of the fastest-growing companies in the translation technology sector in 2012, and 2013. memoQ provides translation memory, terminology, machine translation integration and reference information management in desktop, client/server and web application environments.

References

  1. Dong, Jielin; Zhu, Aaron; Zong, Lin (2007). Network Dictionary. Javvin Technologies. p. 116. ISBN   9781602670006.