Internationalization and localization

Last updated

In computing, internationalization and localization (American) or internationalisation and localisation (British), often abbreviated i18n and l10n respectively, are means of adapting to different languages, regional peculiarities and technical requirements of a target locale.

Contents

Internationalization is the process of designing a software application so that it can be adapted to various languages and regions without engineering changes. Localization is the process of adapting internationalized software for a specific region or language by translating text and adding locale-specific components.

Localization (which is potentially performed multiple times, for different locales) uses the infrastructure or flexibility provided by internationalization (which is ideally performed only once before localization, or as an integral part of ongoing development). [1]

Naming

The terms are frequently abbreviated to the numeronyms i18n (where 18 stands for the number of letters between the first i and the last n in the word internationalization, a usage coined at Digital Equipment Corporation in the 1970s or 1980s) [2] [3] and l10n for localization, due to the length of the words. [4] [5] Some writers have the latter term capitalized (L10n) to help distinguish the two. [6]

Some companies, like IBM and Oracle, use the term globalization , g11n, for the combination of internationalization and localization. [7]

Microsoft defines internationalization as a combination of world-readiness and localization. World-readiness is a developer task, which enables a product to be used with multiple scripts and cultures (globalization) and separates user interface resources in a localizable format (localizability, abbreviated to L12y). [8] [9]

Hewlett-Packard and HP-UX created a system called "National Language Support" or "Native Language Support" (NLS) to produce localizable software. [10]

Some vendors, including IBM [11] use the term National Language Version (NLV) for localized versions of software products supporting only one specific locale. The term implies the existence of other alike NLV versions of the software for different markets; this terminology is not used where no internationalization and localization was undertaken and a software product only supports one language and locale in any version.

Scope

The internationalization and localization process
(based on a chart from the LISA website) Globalisationchart.svg
The internationalization and localization process
(based on a chart from the LISA website)

According to Software without frontiers, the design aspects to consider when internationalizing a product are "data encoding, data and documentation, software construction, hardware device support, and user interaction"; while the key design areas to consider when making a fully internationalized product from scratch are "user interaction, algorithm design and data formats, software services, and documentation". [10]

Translation is typically the most time-consuming component of language localization. [10] This may involve:

Standard locale data

Computer software can encounter differences above and beyond straightforward translation of words and phrases, because computer programs can generate content dynamically. These differences may need to be taken into account by the internationalization process in preparation for translation. Many of these differences are so regular that a conversion between languages can be easily automated. The Common Locale Data Repository by Unicode provides a collection of such differences. Its data is used by major operating systems, including Microsoft Windows, macOS and Debian, and by major Internet companies or projects such as Google and the Wikimedia Foundation. Examples of such differences include:

National conventions

Different countries have different economic conventions, including variations in:

In particular, the United States and Europe differ in most of these cases. Other areas often follow one of these.

Specific third-party services, such as online maps, weather reports, or payment service providers, might not be available worldwide from the same carriers, or at all.

Time zones vary across the world, and this must be taken into account if a product originally only interacted with people in a single time zone. For internationalization, UTC is often used internally and then converted into a local time zone for display purposes.

Different countries have different legal requirements, meaning for example:

Localization also may take into account differences in culture, such as:

Business process for internationalizing software

To internationalize a product, it is important to look at a variety of markets that the product will foreseeably enter. [10] Details such as field length for street addresses, unique format for the address, ability to make the postal code field optional to address countries that do not have postal codes or the state field for countries that do not have states, plus the introduction of new registration flows that adhere to local laws are just some of the examples that make internationalization a complex project. [6] [17] A broader approach takes into account cultural factors regarding for example the adaptation of the business process logic or the inclusion of individual cultural (behavioral) aspects. [10] [18]

Already in the 1990s, companies such as Bull used machine translation (Systran) on a large scale, for all their translation activity: human translators handled pre-editing (making the input machine-readable) and post-editing. [10]

Engineering

Both in re-engineering an existing software or designing a new internationalized software, the first step of internationalization is to split each potentially locale-dependent part (whether code, text or data) into a separate module. [10] Each module can then either rely on a standard library/dependency or be independently replaced as needed for each locale.

The current prevailing practice is for applications to place text in resource files which are loaded during program execution as needed. [10] These strings, stored in resource files, are relatively easy to translate. Programs are often built to reference resource libraries depending on the selected locale data.

The storage for translatable and translated strings is sometimes called a message catalog [10] as the strings are called messages. The catalog generally comprises a set of files in a specific localization format and a standard library to handle said format. One software library and format that aids this is gettext.

Thus to get an application to support multiple languages one would design the application to select the relevant language resource file at runtime. The code required to manage data entry verification and many other locale-sensitive data types also must support differing locale requirements. Modern development systems and operating systems include sophisticated libraries for international support of these types, see also Standard locale data above.

Many localization issues (e.g. writing direction, text sorting) require more profound changes in the software than text translation. For example, OpenOffice.org achieves this with compilation switches.

Process

A globalization method includes, after planning, three implementation steps: internationalization, localization and quality assurance. [10]

To some degree (e.g. for quality assurance), development teams include someone who handles the basic/central stages of the process which then enables all the others. [10] Such persons typically understand foreign languages and cultures and have some technical background. Specialized technical writers are required to construct a culturally appropriate syntax for potentially complicated concepts, coupled with engineering resources to deploy and test the localization elements.

Once properly internationalized, software can rely on more decentralized models for localization: free and open source software usually rely on self-localization by end-users and volunteers, sometimes organized in teams. [19] The GNOME project, for example, has volunteer translation teams for over 100 languages. [20] MediaWiki supports over 500 languages, of which 100 are mostly complete as of September 2023. [21]

When translating existing text to other languages, it is difficult to maintain the parallel versions of texts throughout the life of the product. [22] For instance, if a message displayed to the user is modified, all of the translated versions must be changed.

Independent software vendor such as Microsoft may provides reference software localization guidelines for developers. [23] The software localization language may be different from written language.

Commercial considerations

In a commercial setting, the benefit of localization is access to more markets. In the early 1980s, Lotus 1-2-3 took two years to separate program code and text and lost the market lead in Europe over Microsoft Multiplan. [10] MicroPro found that using an Austrian translator for the West German market caused its WordStar documentation to, an executive said, not "have the tone it should have had". [24]

However, there are considerable costs involved, which go far beyond engineering. Further, business operations must adapt to manage the production, storage and distribution of multiple discrete localized products, which are often being sold in completely different currencies, regulatory environments and tax regimes.

Finally, sales, marketing and technical support must also facilitate their operations in the new languages, to support customers for the localized products. Particularly for relatively small language populations, it may never be economically viable to offer a localized product. Even where large language populations could justify localization for a given product, and a product's internal structure already permits localization, a given software developer or publisher may lack the size and sophistication to manage the ancillary functions associated with operating in multiple locales.

See also

Related Research Articles

Localization or localisation may refer to:

<span class="mw-page-title-main">Mojibake</span> Garbled text as a result of incorrect character encodings

Mojibake is the garbled or gibberish text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system.

A text file is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system.

<span class="mw-page-title-main">Windows-1252</span> Windows character set for Latin alphabet

Windows-1252 or CP-1252 is a legacy single-byte character encoding that is used by default in Microsoft Windows throughout the Americas, Western Europe, Oceania, and much of Africa.

In computing, a locale is a set of parameters that defines the user's language, region and any special variant preferences that the user wants to see in their user interface. Usually a locale identifier consists of at least a language code and a country/region code. Locale is an important aspect of i18n.

In computing, gettext is an internationalization and localization system commonly used for writing multilingual programs on Unix-like computer operating systems. One of the main benefits of gettext is that it separates programming from translating. The most commonly used implementation of gettext is GNU gettext, released by the GNU Project in 1995. The runtime library is libintl. gettext provides an option to use different strings for any number of plural forms of nouns, but this feature has no support for grammatical gender. The main filename extensions used by this system are .POT, .PO and .MO.

International Components for Unicode (ICU) is an open-source project of mature C/C++ and Java libraries for Unicode support, software internationalization, and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all platforms and between C, C++, and Java software. The ICU project is a technical committee of the Unicode Consortium and sponsored, supported, and used by IBM and many other companies. ICU has been included as a standard component with Microsoft Windows since Windows 10 version 1703.

A numeronym is a word, usually an abbreviation, composed partially or wholly of numerals. The term can be used to describe several different number-based constructs, but it most commonly refers to a contraction in which all letters between the first and last of a word are replaced with the number of omitted letters . According to Anne H. Soukhanov, editor of the Microsoft Encarta College Dictionary, it originally referred to phonewords – words spelled by the letters of keys of a telephone pad.

Pseudolocalization is a software testing method used for testing internationalization aspects of software. Instead of translating the text of the software into a foreign language, as in the process of localization, the textual elements of an application are replaced with an altered version of the original language. For example, instead of "Account Settings", the text may be altered to display as "!!! Àççôûñţ Šéţţîñĝš !!!".

Globalize is a cross-platform JavaScript library for internationalization and localization that uses the Unicode Common Locale Data Repository (CLDR).

<span class="mw-page-title-main">Okapi Framework</span>

The Okapi Framework is a cross-platform and open-source set of components and applications that offer extensive support for localizing and translating documentation and software.

Language localisation is the process of adapting a product's translation to a specific country or region. It is the second phase of a larger process of product translation and cultural adaptation to account for differences in distinct markets, a process known as internationalisation and localisation.

<span class="mw-page-title-main">Template processor</span> Software designed to combine templates with a data model to produce result documents

A template processor is software designed to combine templates with data to produce resulting documents or programs. The language that the templates are written in is known as a template language or templating language. For purposes of this article, a result document is any kind of formatted output, including documents, web pages, or source code, either in whole or in fragments. A template engine is ordinarily included as a part of a web template system or application framework, and may be used also as a preprocessor or filter.

The Translate Toolkit is a localization and translation toolkit. It provides a set of tools for working with localization file formats and files that might need localization. The toolkit also provides an API on which to develop other localization tools.

<span class="mw-page-title-main">Gtranslator</span> Free computer-assisted translation software

Gtranslator is a specialized computer-assisted translation software and po file editor for the internationalization and localization (i18n) of software that uses the gettext system. It handles all forms of gettext po files and includes features such as Find/Replace, Translation Memory, different Translator Profiles, Messages Table, Easy Navigation and Editing of translation messages and comments of the translation where accurate. Gtranslator includes also a plugin system with plugins such as Alternate Language, Insert Tags, Open Tran, Integration with Subversion, and Source Code Viewer. Gtranslator is written in the programming language C for the GNOME desktop environment. It is available as free software under the terms of the GNU General Public License (GPL).

A resource bundle is a Java .properties file that contains locale-specific data. It is a way of internationalising a Java application by making the code locale-independent.

Transifex is a globalization management system (GMS), a proprietary software, and a web-based translation platform. It targets technical projects with frequently updated content, such as software, documentation, and websites, and encourages the automation of the localization workflow by integrating with common developer tools.

Website localization is the process of adapting an existing website to local language and culture in the target market. It is the process of adapting a website into a different linguistic and cultural context— involving much more than the simple translation of text. This modification process must reflect specific language and cultural preferences in the content, images and overall design and requirements of the site – all while maintaining the integrity of the website. Culturally adapted web sites reduce the amount of required cognitive efforts from visitors of the site to process and access information, making navigation easier and attitudes toward the web site more favorable. The modification of the website must additionally take into consideration the stated purpose of the new website with a focus on the targeted audience/market in the new locale. Website localization aims to customize a website so that it seems "natural", to its viewers despite cultural differences between the creators and the audience. Two factors are involved—programming expertise and linguistic/cultural knowledge.

Social localisation (from Latin locus and the English term locale, "a place where something happens or is set") is, like language localization the second phase of a larger process of product and service translation and cultural adaptation to account for differences in distinct markets and societies, a process known as internationalization and localization.

In computing, a hardware code page (HWCP) refers to a code page supported natively by a hardware device such as a display adapter or printer. The glyphs to present the characters are stored in the alphanumeric character generator's resident read-only memory and are thus not user-changeable. They are available for use by the system without having to load any font definitions into the device first. Startup messages issued by a PC's System BIOS or displayed by an operating system before initializing its own code page switching logic and font management and before switching to graphics mode are displayed in a computer's default hardware code page.

References

  1. Esselink, Bert (2006). "The Evolution of Localization" (PDF). In Pym, Anthony; Perekrestenko, Alexander; Starink, Bram (eds.). Translation Technology and Its Teaching (With Much Mention of Localization). Tarragona: Intercultural Studies Group – URV. pp. 21–29. ISBN   84-611-1131-1. Archived from the original (PDF) on 7 September 2012. In a nutshell, localization revolves around combining language and technology to produce a product that can cross cultural and language barriers. No more, no less.
  2. "Glossary of W3C Jargon". W3C . Archived from the original on 2 September 2011. Retrieved 16 September 2023.
  3. "Origin of the Abbreviation I18n". I18nGuy. Archived from the original on 27 June 2014. Retrieved 19 February 2022.
  4. Ishida, Richard; Miller, Susan K. (2005-12-05). "Localization vs. Internationalization". W3C . Archived from the original on 2016-04-03. Retrieved 2023-09-16.
  5. "Concepts (GNU gettext utilities)". gnu.org . Archived from the original on 18 September 2019. Retrieved 16 September 2023. Many people, tired of writing these long words over and over again, took the habit of writing i18n and l10n instead, quoting the first and last letter of each word, and replacing the run of intermediate letters by a number merely telling how many such letters there are.
  6. 1 2 alan (29 March 2011). "What is Internationalization (i18n), Localization (L10n) and Globalization (g11n)". ccjk.com. Archived from the original on 2 April 2015. Retrieved 16 September 2023. The capital L in L10n helps to distinguish it from the lowercase i in i18n.
  7. "Globalize Your Business". IBM. Archived from the original on 31 March 2016.
  8. "Globalization Step-by-Step". Go Global Developer Center. Archived from the original on 12 April 2015.
  9. "Globalization Step-by-Step: Understanding Internationalization". Go Global Developer Center. Archived from the original on 26 May 2015.
  10. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Hall, P. A. V.; Hudson, R., eds. (1997). Software without Frontiers: A Multi-Platform, Multi-Cultural, Multi-Nation Approach. Chichester: Wiley. ISBN   0-471-96974-5.
  11. "National language version". IBM .
  12. "Plural forms (GNU gettext utilities)". gnu.org . Archived from the original on 14 March 2021. Retrieved 16 September 2023.
  13. "Do We Need to Localize Keyboard Shortcuts?". Human Translation Services – Language to Language Translation. 21 August 2014. Archived from the original on 3 April 2015. Retrieved 19 February 2022.
  14. Mateen Haider (17 May 2016). "Pakistan Expresses Concern Over India's Controversial 'Maps Bill'". Dawn. Archived from the original on 10 May 2018. Retrieved 9 May 2018.
  15. Yasser Latif Hamdani (18 May 2016). "Changing Maps Will Not Mean Kashmir Is a Part of You, India". The Express Tribune. Retrieved 19 February 2022.
  16. "An Overview of the Geospatial Information Regulation Bill". Madras Courier. 24 July 2017. Archived from the original on 29 October 2020. Retrieved 19 February 2022.
  17. "Appendix V International Address Formats". Microsoft Docs. 2 June 2008. Archived from the original on 19 May 2021. Retrieved 19 February 2022.
  18. Pawlowski, Jan M. Culture Profiles: Facilitating Global Learning and Knowledge Sharing (PDF) (Draft version). Archived (PDF) from the original on 2011-07-16. Retrieved 2009-10-01.
  19. Reina, Laura Arjona; Robles, Gregorio; González-Barahona, Jesús M. (2013). "A Preliminary Analysis of Localization in Free Software: How Translations Are Performed". In Petrinja, Etiel; Succi, Giancarlo; Ioini, Nabil El; Sillitti, Alberto (eds.). Open Source Software: Quality Verification. IFIP Advances in Information and Communication Technology. Vol. 404. Springer Berlin Heidelberg. pp. 153–167. doi: 10.1007/978-3-642-38928-3_11 . ISBN   978-3-642-38927-6.
  20. "GNOME Languages". GNOME . Archived from the original on 29 August 2023. Retrieved 16 September 2023.
  21. "Translating:Group Statistics". translatewiki.net . Archived from the original on 2023-08-29. Retrieved 2023-09-16.
  22. "How to Translate a Game Into 20 Languages and Avoid Going to Hell: Exorcising the Four Devils of Confusion". PocketGamer.biz. 4 April 2014. Archived from the original on 7 December 2017. Retrieved 19 February 2022.
  23. jowilco (2023-08-24). "Microsoft Localization Style Guides - Globalization". learn.microsoft.com. Retrieved 2024-09-15.
  24. Schrage, Michael (17 February 1985). "IBM Wins Dominance in European Computer Market". The Washington Post . Archived from the original on 29 August 2018. Retrieved 29 August 2018.

Further reading