This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these template messages)
|
Original author(s) | Sun Microsystems [1] |
---|---|
Developer(s) | various |
Initial release | 1990[2] |
Stable release | |
Repository | various based on OpenSolaris and GNU gettext |
Operating system | Cross-platform |
Type | Internationalization and localization |
License | Various free software licenses |
Website | www |
In computing, gettext is an internationalization and localization (i18n and l10n) system commonly used for writing multilingual programs on Unix-like computer operating systems. One of the main benefits of gettext is that it separates programming from translating. [4] The most commonly used implementation of gettext is GNU gettext, [5] released by the GNU Project in 1995. The runtime library is libintl. gettext provides an option to use different strings for any number of plural forms of nouns, but this feature has no support for grammatical gender. The main filename extensions used by this system are .POT (Portable Object Template), .PO (Portable Object) and .MO (Machine Object). [6]
Initially, POSIX provided no means of localizing messages. Two proposals were raised in the late 1980s, the 1988 Uniforum gettext and the 1989 X/Open catgets (XPG-3 § 5). Sun Microsystems implemented the first gettext in 1993. [1] The Unix and POSIX developers never really agreed on what kind of interface to use (the other option is the X/Open catgets), so many C libraries, including glibc, implemented both. [7] As of August 2019 [update] , whether gettext should be part of POSIX was still a point of debate in the Austin Group, despite the fact that its old foe has already fallen out of use. Concerns cited included its dependence on the system-set locale (a global variable subject to multithreading problems) and its support for newer C-language extensions involving wide strings. [8]
The GNU Project decided that the message-as-key approach of gettext is simpler and more friendly. (Most other systems, including catgets, requires the developer to come up with "key" names for every string.) [9] They released GNU gettext, a free software implementation of the system in 1995. [2] Gettext, GNU or not, has since been ported to many programming languages. [10] The simplicity of po and widespread editor support even lead to its adoption in non-program contexts for text documents or as an intermediate between other localization formats, with converters like po4a (po for anything) and Translate Toolkit emerging to provide such a bridge. [11] [12]
The basic interface of gettext is the gettext(const char*)
function, which accepts a string that the user will see in the original language, usually English. To save typing time and reduce code clutter, this function is commonly aliased to _
: [13]
printf(gettext("My name is %s.\n"),my_name);printf(_("My name is %s.\n"),my_name);// same, but shorter
gettext()
then uses the supplied strings as keys for looking up translations, and will return the original string when no translation is available. This is in contrast to POSIX catgets()
, [14] AmigaOS GetString()
, [15] or Microsoft Windows LoadString()
where a programmatic ID (often an integer) is used. To handle the case where the same original-language text can have different meanings, gettext has functions like cgettext()
that accept an additional "context" string.
xgettext
is run on the sources to produce a .pot
(Portable Object Template) file, which contains a list of all the translatable strings extracted from the sources. Comments starting with ///
are used to give translators hints, although other prefixes are also configurable to further limit the scope. One such common prefix is TRANSLATORS:
.
For example, an input file with a comment might look like:
/// TRANSLATORS: %s contains the user's name as specified in Preferencesprintf(_("My name is %s.\n"),my_name);
xgettext
is run using the command:
xgettext -c /
The resultant .pot file looks like this with the comment (note that xgettext recognizes the string as a C-language printf format string):
#. TRANSLATORS: %s contains the user's name as specified in Preferences#, c-format#: src/name.c:36msgid"My name is %s.\n"msgstr""
In POSIX shell script, gettext provides a gettext.sh
library one can include that provides the many same functions gettext provides in similar languages. [16] GNU bash also has a simplified construct $"msgid"
for the simple gettext function, although it depends on the C library to provide a gettext()
function. [17]
The translator derives a .po
(Portable Object) file from the template using the msginit
program, then fills out the translations. [18] msginit
initializes the translations so, for instance, for a French language translation, the command to run would be: [6]
msginit --locale=fr --input=name.pot
This will create fr.po
. The translator then edits the resultant file, either by hand or with a translation tool like Poedit, or Emacs with its editing mode for .po
files. An edited entry will look like:
#: src/name.c:36msgid"My name is %s.\n"msgstr"Je m'appelle %s.\n"
Finally, the .po files are compiled with msgfmt
into binary .mo
(Machine Object) files. GNU gettext may use its own file name extension .gmo
on systems with another gettext implementation. [19] These are now ready for distribution with the software package.
GNU msgfmt
can also perform some checks relevant to the format string used by the programming language. It also allows for outputting to language-specific formats other than MO; [20] the X/Open equivalent is gencat
.
In later phases of the developmental workflow, msgmerge
can be used to "update" an old translation to a newer template. There is also msgunfmt
for reverse-compiling .mo
files, and many other utilities for batch processing.
The user, on Unix-type systems, sets the environment variable LC_MESSAGES
, and the program will display strings in the selected language, if there is an .mo
file for it.
Users on GNU variants can also use the environment variable LANGUAGE
instead. Its main difference from the Unix variable is that it supports multiple languages, separated with a colon, for fallback. [21]
The ngettext()
interface accounts for the count of a noun in the string. As with the convention of gettext()
, it is often aliased to N_
in practical use. Consider the code sample:
// parameters: english singular, english plural, integer countprintf(ngettext("%d translated message","%d translated messages",n),n);
A header in the ""
(empty string) entry of the PO file stores some metadata, one of which is the plural form that the language uses, usually specified using a C-style ternary operator. Suppose we want to translate for the Slovene language:
msgid""msgstr"""...""Language: sl\n""Plural-Forms: nplurals=4; plural=(n%100==1 ? 1 : n%100==2 ? 2 : n%100==3 || n%100==4 ? 3 : 0);\n"
Since now there are four plural forms, the final po would look like:
#: src/msgfmt.c:876#, c-formatmsgid"%d translated message"msgid_plural"%d translated messages"msgstr[0]"%d prevedenih sporočil"msgstr[1]"%d prevedeno sporočilo"msgstr[2]"%d prevedeni sporočili"msgstr[3]"%d prevedena sporočila"
Reference plural rules for languages are provided by the Unicode consortium. [22] msginit also prefills the appropriate rule when creating a file for one specific language. [18]
In addition to C, gettext has the following implementations: C# for both ASP.NET [23] [24] and for WPF, [25] Perl, [26] PHP, [27] Python, [28] R, [29] Scala, [30] and Node.js. [31]
GNU gettext has native support for Objective-C, but there is no support for the Swift programming language yet. A commonly used gettext implementation on these Cocoa platforms is POLocalizedString. [32] The Microsoft Outlook for iOS team also provides a LocalizedStringsKit library with a gettext-like API. [33]
AWK is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems.
The Portable Operating System Interface is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system and user-level application programming interfaces (APIs), along with command line shells and utility interfaces, for software compatibility (portability) with variants of Unix and other operating systems. POSIX is also a trademark of the IEEE. POSIX is intended to be used by both application and system developers.
A shell script is a computer program designed to be run by a Unix shell, a command-line interpreter. The various dialects of shell scripts are considered to be scripting languages. Typical operations performed by shell scripts include file manipulation, program execution, and printing text. A script which sets up the environment, runs the program, and does any necessary cleanup or logging, is called a wrapper.
A man page is a form of software documentation usually found on a Unix or Unix-like operating system. Topics covered include computer programs, formal standards and conventions, and even abstract concepts. A user may invoke a man page by issuing the man
command.
In computing, internationalization and localization (American) or internationalisation and localisation (British), often abbreviated i18n and l10n respectively, are means of adapting computer software to different languages, regional peculiarities and technical requirements of a target locale.
The C standard library or libc is the standard library for the C programming language, as specified in the ISO C standard. Starting from the original ANSI C standard, it was developed at the same time as the C library POSIX specification, which is a superset of it. Since ANSI C was adopted by the International Organization for Standardization, the C standard library is also called the ISO C library.
MinGW, formerly mingw32, is a free and open source software development environment to create Microsoft Windows applications.
The archiver, also known simply as ar, is a Unix utility that maintains groups of files as a single archive file. Today, ar
is generally used only to create and update static library files that the link editor or linker uses and for generating .deb packages for the Debian family; it can be used to create archives for any purpose, but has been largely replaced by tar
for purposes other than static libraries. An implementation of ar
is included as one of the GNU Binutils.
printf is a C standard library function that formats text and writes it to standard output.
In computer programming, glob patterns specify sets of filenames with wildcard characters. For example, the Unix Bash shell command mv *.txttextfiles/
moves all files with names ending in .txt
from the current directory to the directory textfiles
. Here, *
is a wildcard and *.txt
is a glob pattern. The wildcard *
stands for "any string of any length including empty, but excluding the path separator characters ".
xargs is a command on Unix and most Unix-like operating systems used to build and execute commands from standard input. It converts input from standard input into arguments to a command.
In computing, POSIX Threads, commonly known as pthreads, is an execution model that exists independently from a programming language, as well as a parallel execution model. It allows a program to control multiple different flows of work that overlap in time. Each flow of work is referred to as a thread, and creation and control over these flows is achieved by making calls to the POSIX Threads API. POSIX Threads is an API defined by the Institute of Electrical and Electronics Engineers (IEEE) standard POSIX.1c, Threads extensions .
tr is a command in Unix, Plan 9, Inferno, and Unix-like operating systems. It is an abbreviation of translate or transliterate, indicating its operation of replacing or removing specific characters in its input data set.
In computing, echo
is a command that outputs the strings that are passed to it as arguments. It is a command available in various operating system shells and typically used in shell scripts and batch files to output status text to the screen or a computer file, or as a source part of a pipeline.
In computing, tee
is a command in command-line interpreters (shells) using standard streams which reads standard input and writes it to both standard output and one or more files, effectively duplicating its input. It is primarily used in conjunction with pipes and filters. The command is named after the T-splitter used in plumbing.
The Translate Toolkit is a localization and translation toolkit. It provides a set of tools for working with localization file formats and files that might need localization. The toolkit also provides an API on which to develop other localization tools.
Gtranslator is a specialized computer-assisted translation software and po file editor for the internationalization and localization (i18n) of software that uses the gettext system. It handles all forms of gettext po files and includes features such as Find/Replace, Translation Memory, different Translator Profiles, Messages Table, Easy Navigation and Editing of translation messages and comments of the translation where accurate. Gtranslator includes also a plugin system with plugins such as Alternate Language, Insert Tags, Open Tran, Integration with Subversion, and Source Code Viewer. Gtranslator is written in the programming language C for the GNOME desktop environment. It is available as free software under the terms of the GNU General Public License (GPL).
Getopt is a C library function used to parse command-line options of the Unix/POSIX style. It is a part of the POSIX specification, and is universal to Unix-like systems. It is also the name of a Unix program for parsing command line arguments in shell scripts.
In Unix and Unix-like operating systems, printf is a shell builtin that formats and outputs text like the same-named C function.
{{cite web}}
: CS1 maint: unfit URL (link)