Segmentation Rules eXchange

Last updated

Segmentation Rules eXchange or SRX is an XML-based standard that was maintained by Localization Industry Standards Association, [1] until it became insolvent in 2011, and then by the Globalization and Localization Association (GALA). [2]

Contents

SRX provides a common way to describe how to segment text for translation and other language-related processes. It was created when it was realized that TMX was less useful than expected in certain instances due to differences in how tools segment text. SRX is intended to enhance the TMX standard so that translation memory (TM) data that is exchanged between applications can be used more effectively. Having the segmentation rules available that were used when a TM was created increases the usefulness of the TM data.

Implementation difficulties

SRX make use of the ICU Regular Expression syntax, [3] but not all programming languages support all ICU expressions, making implementing SRX in some languages difficult or impossible. Java is an example of this. [4]

Version history

SRX version 1.0 [5] was officially accepted as an OSCAR standard in April 2004.

SRX version 2.0 [6] was officially accepted as an OSCAR standard in April 2008.

SRX forms part of the Open Architecture for XML Authoring and Localization (OAXAL) reference architecture.

Related Research Articles

A translation memory (TM) is a database that stores "segments", which can be sentences, paragraphs or sentence-like units that have previously been translated, in order to aid human translators. The translation memory stores the source text and its corresponding translation in language pairs called “translation units”. Individual words are handled by terminology bases and are not within the domain of TM.

In computing, RELAX NG is a schema language for XML—a RELAX NG schema specifies a pattern for the structure and content of an XML document. A RELAX NG schema is itself an XML document but RELAX NG also offers a popular compact, non-XML syntax. Compared to other XML schema languages RELAX NG is considered relatively simple.

James Clark is a software engineer and creator of various open-source software including groff, expat and several XML specifications.

OmegaT Computer assisted translation tool written in Java

OmegaT is a computer-assisted translation tool written in the Java programming language. It is free software originally developed by Keith Godfrey in 2000, and is currently developed by a team led by Aaron Madlon-Kay.

Translation Memory eXchange (TMX) is an XML specification for the exchange of translation memory (TM) data between computer-aided translation and localization tools with little or no loss of critical data.

Okapi Framework

The Okapi Framework is a cross-platform and open-source set of components and applications that offer extensive support for localizing and translating documentation and software.

Global information management Metrics eXchange or GMX is a collection of current and proposed standards, primarily targeted at the needs of the translation industry. They are concerned with measuring quantitatively aspects of a document, particularly those with relevance to the translation process and were being standardised by Localization Industry Standards Association as part of the Open Architecture for XML Authoring and Localization until the demise of LISA. The primary use cases are in quoting, estimating and billing translation work.

OAXAL: Open Architecture for XML Authoring and Localization is an Organization for the Advancement of Structured Information Standards (OASIS) standards-based initiative to encourage the development of an open Standards approach to XML Authoring and Localization. OAXAL is an official OASIS Reference Architecture Technical Committee.

XLIFF is an XML-based bitext format created to standardize the way localizable data are passed between and among tools during a localization process and a common format for CAT tool exchange. The XLIFF Technical Committee (TC) first convened at OASIS in December 2001, but the first fully ratified version of XLIFF appeared as XLIFF Version 1.2 in February 2008. Its current specification is v2.1 released on 2018-02-13, which is backwards compatible with v2.0 released on 2014-08-05.

Mark Edward Davis is an American specialist in the internationalization and localization of software and the co-founder and president of the Unicode Consortium.

Virtaal

Virtaal is a computer-assisted translation tool written in the Python programming language. It is free software developed and maintained by Translate.org.za.

GlobalSight is a free and open source translation management system (TMS) released under the Apache License 2.0. As of version 7.1 it supports the TMX and SRX 2.0 Localization Industry Standards Association standards. It was developed in the Java programming language and uses a MySQL database. GlobalSight also supports computer-assisted translation and machine translation.

Open Language Tools is a Java project released by Sun Microsystems under the terms of Sun’s CDDL.

openTMS is an acronym for Open Source Translation Management System.

XQuery API for Java

XQuery API for Java (XQJ) refers to the common Java API for the W3C XQuery 1.0 specification.

Localization Industry Standards Association or LISA was a Swiss-based trade body concerning the translation of computer software into multiple natural languages, which existed from 1990 to February 2011. It counted among its members most of the large information technology companies of the period, including Adobe, Cisco, Hewlett-Packard, IBM, McAfee, Nokia, Novell and Xerox.

Swordfish Translation Editor is a Computer-assisted translation software.

memoQ is a proprietary computer-assisted translation software suite which runs on Microsoft Windows operating systems. It is developed by the Hungarian software company memoQ Fordítástechnológiai Zrt., formerly Kilgray, a provider of translation management software established in 2004 and cited as one of the fastest-growing companies in the translation technology sector in 2012 and 2013. memoQ provides translation memory, terminology, machine translation integration and reference information management in desktop, client/server and web application environments.

TermBase eXchange (TBX) is an international standard for the representation of structured concept-oriented terminological data, copublished by ISO and the Localization Industry Standards Association (LISA). Originally released in 2002 by LISA's OSCAR special interest group, TBX was adopted by ISO TC 37 in 2008. In 2019 ISO 30042:2008 was withdrawn and revised by ISO 30042:2019. It is currently available as an ISO standard and as an open, industry standard, available at no charge.

References

  1. SRX home - Archived copy of main SRX page on the LISA OSCAR web site
  2. Globalization and Localization Association - Page on the OSCAR LISA Standards
  3. SRX regular Expressions - Archived copy
  4. SRX and Java - Comparison between SRX and Java regular expressions
  5. SRX 1.0 specification - "Archived copy". Archived from the original on 2012-08-01. Retrieved 2011-07-20.{{cite web}}: CS1 maint: archived copy as title (link)
  6. SRX 2.0 specification