Pseudolocalization

Last updated

Pseudolocalization (or pseudo-localization) is a software testing method used for testing internationalization aspects of software. Instead of translating the text of the software into a foreign language, as in the process of localization, the textual elements of an application are replaced with an altered version of the original language. For example, instead of "Account Settings", the text may be altered to display as "!!! Àççôûñţ Šéţţîñĝš !!!". [1]

Contents

These specific alterations make the original words appear readable, but include the most problematic characteristics of the world's languages: varying length of text or characters, language direction, fit into the interface and so on.

Localization process

Traditionally, localization of software is independent of the software development process. In a typical scenario, software would be built and tested in one base language (such as English), with any localizable elements being extracted into external resources. Those resources are handed off to a localization team for translation into different target languages. [2] The problem with this approach is that many subtle software bugs may be found during the process of localization, when it is too late (or more likely, too expensive) to fix them. [2]

The types of problems that can arise during localization involve differences in how written text appears in different languages. These problems include:

In addition, the localization process may uncover places where an element should be localizable, but is hard coded in a source language. Similarly, there may be elements that were designed to be localized, but should not be (e.g. the element names in an XML or HTML document.) [3]

Pseudolocalization is designed to catch these types of bugs during the development cycle, by mechanically replacing all localizable elements with a pseudo-language that is readable by speakers of the source language, but which contains most of the troublesome elements of other languages and scripts. This is why pseudolocalisation is to be considered an engineering or internationalization tool more than a localization one.

Pseudolocalization in Microsoft Windows

Although the pseudolocalization technique has been used at Microsoft since the late 90s, it was made available to developers as a feature during the Windows Vista development cycle. [4] The type of pseudo-language invented for this purpose is called a pseudo locale in Windows parlance. These locales were designed to use character sets and scripts characteristics from one of the three broad classes of foreign languages used by Windows at the time—basic ("Western"), mirrored ("Near-Eastern"), and CJK ("Far-Eastern"). [2] Prior to Vista, each of these three language classes had their own separate builds of Windows, with potentially different code bases (and thus, different behaviors and bugs.) The pseudo locales created for each of these language families would produce text that still "reads" as English, but is made up of script from another languages. For example, the text string

Edit program settings

would be rendered in the "basic" pseudo-locale as

[!!! εÐiţ Þr0ģЯãm səTτıИğ§ !!!]

This process produces translated strings that are longer, include non-ASCII characters, and (in the case of the "mirrored" pseudo-locale), are written right-to-left. [4]

Note that the brackets on either side of the text in this example help to spot the following issues:

Pseudolocalization process at Microsoft

Michael Kaplan (a Microsoft program manager) describes the process of pseudo-localization as

an eager and hardworking yet naive intern localizer, who is eager to prove [himself] and who is going to translate every single string that you don't say shouldn't get translated. [3]

One of the key features of the pseudolocalization process is that it happens automatically, during the development cycle, as part of a routine build. The process is almost identical to the process used to produce true localized builds, but is done before a build is tested, much earlier in the development cycle. This leaves time for any bugs that are found to be fixed in the base code, which is much easier than bugs not found until a release date is near. [2]

The builds that are produced by the pseudolocalization process are tested using the same QA cycle as a non-localized build. Since the pseudo-locales are mimicking English text, they can be tested by an English speaker. Beta version of Windows (7 and 8) have been released with some pseudo-localized strings intact. [5] [6] For these recent version of Windows, the pseudo-localized build is the primary staging build (the one created routinely for testing), and the final English language build is a "localized" version of that. [3]

Pseudolocalization tools for other platforms

Besides the tools used internally by Microsoft, other internationalization tools now include pseudolocalization options. These tools include Alchemy Catalyst from Alchemy Software Development, SDL Passolo from SDL and Globalyst from g11n. Such tools include pseudo-localization capability, including ability to view rendered Pseudo-localized dialogs and automating the testing process itself. While tools like Globalyst complete the whole process of creating pseudolocalised build and automate the testing, it can also be done by running a custom made pseudolocalization script on the extracted text resources and manually testing it.

There are a variety of free pseudolocalization resources on the Internet that will create pseudolocalized versions of common localization formats like iOS strings, Android xml, Gettext po, and others. These sites allow developers to upload strings file to a Web site and download the resulting pseudolocalized file.

See also

Related Research Articles

<span class="mw-page-title-main">Software</span> Non-tangible executable component of a computer

Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work.

<span class="mw-page-title-main">Internationalization and localization</span> Process of making software accessible to people in different areas of the world

In computing, internationalization and localization (American) or internationalisation and localisation, often abbreviated i18n and L10n, are means of adapting computer software to different languages, regional peculiarities and technical requirements of a target locale.

<span class="mw-page-title-main">Mojibake</span> Garbled text as a result of incorrect character encodings

Mojibake is the garbled text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system.

<span class="mw-page-title-main">Microsoft Agent</span> Virtual software agent technology

Microsoft Agent was a technology developed by Microsoft which employed animated characters, text-to-speech engines, and speech recognition software to enhance interaction with computer users. Thus it was an example of an embodied agent. It came preinstalled as part of Windows 98 through Windows Vista. It was not included with Windows 7 but can be downloaded from Microsoft. It was completely discontinued in Windows 8. Microsoft Agent functionality was exposed as an ActiveX control that can be used by web pages.

In computing, a locale is a set of parameters that defines the user's language, region and any special variant preferences that the user wants to see in their user interface. Usually a locale identifier consists of at least a language code and a country/region code. Locale is an important aspect of i18n.

In computing, gettext is an internationalization and localization system commonly used for writing multilingual programs on Unix-like computer operating systems. One of the main benefits of gettext is that it separates programming from translating. The most commonly used implementation of gettext is GNU gettext, released by the GNU Project in 1995. The runtime library is libintl. gettext provides an option to use different strings for any number of plural forms of nouns, but this feature has no support for grammatical gender.

Game programming, a subset of game development, is the software development of video games. Game programming requires substantial skill in software engineering and computer programming in a given language, as well as specialization in one or more of the following areas: simulation, computer graphics, artificial intelligence, physics, audio programming, and input. For multiplayer games, knowledge of network programming is required. In some genres, e.g. fighting games, advanced network programming is often demanded, as the netcode and its properties are considered by players and critics to be some of the most important metrics of the game's quality. For massively multiplayer online games (MMOGs), even further knowledge of database programming and advanced networking programming are required. Though often engaged in by professional game programmers, there is a thriving scene of independent developers who lack a relationship with a publishing company.

Hard coding is the software development practice of embedding data directly into the source code of a program or other executable object, as opposed to obtaining the data from external sources or generating it at runtime. Hard-coded data typically can only be modified by editing the source code and recompiling the executable, although it can be changed in memory or on disk using a debugger or hex editor. Data that are hard-coded is best for unchanging pieces of information, such as physical constants, version numbers and static text elements. Softcoded data, on the other hand, encode arbitrary information through user input, text files, INI files, HTTP server responses, configuration files, preprocessor macros, external constants, databases, command-line arguments, and are determined at runtime.

International Components for Unicode (ICU) is an open-source project of mature C/C++ and Java libraries for Unicode support, software internationalization, and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all platforms and between C, C++, and Java software. The ICU project is a technical committee of the Unicode Consortium and sponsored, supported, and used by IBM and many other companies.

A translation management system (TMS), formerly globalization management system (GMS), is a type of software for automating many parts of the human language translation process and maximizing translator efficiency. The idea of a translation management system is to automate all repeatable and non-essential work that can be done by software/systems and leaving only the creative work of translation and review to be done by human beings. A translation management system generally includes at least two types of technology: process management technology to automate the flow of work, and linguistic technology to aid the translator.

Language localisation is the process of adapting a product's translation to a specific country or region. It is the second phase of a larger process of product translation and cultural adaptation to account for differences in distinct markets, a process known as internationalisation and localisation.

<span class="mw-page-title-main">Template processor</span> Software designed to combine templates with a data model to produce result documents

A template processor is software designed to combine templates with a data model to produce result documents. The language that the templates are written in is known as a template language or templating language. For purposes of this article, a result document is any kind of formatted output, including documents, web pages, or source code, either in whole or in fragments. A template engine is ordinarily included as a part of a web template system or application framework, and may be used also as a preprocessor or filter.

<span class="mw-page-title-main">Comment (computer programming)</span> Explanatory note in the source code of a computer program

In computer programming, a comment is a programmer-readable explanation or annotation in the source code of a computer program. They are added with the purpose of making the source code easier for humans to understand, and are generally ignored by compilers and interpreters. The syntax of comments in various programming languages varies considerably.

Video game localization, or video game localisation, is the process of preparing a video game for a market outside of where it was originally published. The game's name, art assets, packaging, manuals, and cultural and legal differences are typically altered.

Indic Computing means "computing in Indic", i.e., Indian Scripts and Languages. It involves developing software in Indic Scripts/languages, Input methods, Localization of computer applications, web development, Database Management, Spell checkers, Speech to Text and Text to Speech applications and OCR in Indian languages.

Apache C++ Standard Library project is a set of classes and functions, which are written in the core language.

SDL Passolo is a specialised visual software localization tool developed to enable the translation of user interfaces.

A resource bundle is a Java .properties file that contains locale-specific data. It is a way of internationalising a Java application by making the code locale-independent.

Microsoft was one of the first companies to implement Unicode in their products. Windows NT was the first operating system that used "wide characters" in system calls. Using the UCS-2 encoding scheme at first, it was upgraded to the variable-width encoding UTF-16 starting with Windows 2000, allowing a representation of additional planes with surrogate pairs. However Microsoft did not support UTF-8 in its API until May 2019.

This article discusses a set of tactics useful in software testing. It is intended as a comprehensive list of tactical approaches to Software Quality Assurance (more widely colloquially known as Quality Assurance and general application of the test method.

References

  1. Benjamin Zadik (12 April 2013). "Pseudolocalization: Prepare your app for localization" . Retrieved 13 April 2013.
  2. 1 2 3 4 Raymond Chen (26 July 2012). "A brief and also incomplete history of Windows localization" . Retrieved 26 July 2012.
  3. 1 2 3 Michael Kaplan (11 April 2011). "One of my colleagues is the "Pseudo Man"" . Retrieved 26 July 2012.
  4. 1 2 Shawn Steele (27 June 2006). "Pseudo Locales in Windows Vista Beta 2" . Retrieved 26 July 2012.
  5. Steven Sinofsky (7 July 2009). "Engineering Windows 7 for a Global Market" . Retrieved 26 July 2012.
  6. Kriti Jindal (16 March 2012). "Install PowerShell Web Access on non-English machines" . Retrieved 26 July 2012.