Original author(s) | Hewlett-Packard |
---|---|
Developer(s) | Various open-source and commercial developers |
Repository | https://git.savannah.gnu.org/git/libiconv.git |
Operating system | Unix, Unix-like, Microsoft Windows, IBM i |
Platform | Cross-platform |
Type | Command |
License | libiconv: LGPL iconv: GPL win-iconv: Public domain [1] |
In Unix and Unix-like operating systems, iconv (an abbreviation of internationalization conversion) [2] is a command-line program [3] and a standardized application programming interface (API) [4] used to convert between different character encodings. "It can convert from any of these encodings to any other, through Unicode conversion." [5]
Initially appearing on the HP-UX operating system, [6] iconv()
as well as the utility was standardized within XPG4 and is part of the Single UNIX Specification (SUS).
Most Linux distributions provide an implementation, either from the GNU Standard C Library (included since version 2.1, February 1999), or the more traditional GNU libiconv
, for systems based on other Standard C Libraries.
The iconv
function [7] on both is licensed as LGPL, so it is linkable with closed source applications.
Unlike the libraries, the iconv
utility is licensed under GPL in both implementations. [8] The GNU libiconv implementation is portable, and can be used on various UNIX-like and non-UNIX systems. Version 0.3 dates from December 1999.
The uconv utility from International Components for Unicode provides an iconv-compatible command-line syntax for transcoding.
Most BSD systems use NetBSD's implementation, which first appeared in December 2004.
Currently, over a hundred different character encodings are supported in the GNU variant. [5]
Under Microsoft Windows, the iconv library and the utility is provided by GNU's libiconv found in Cygwin [9] and GnuWin32 [10] environments; there is also a "purely Win32" implementation called "win-iconv" that uses Windows' built-in routines for conversion. [11] The iconv function is also available for many programming languages.
The iconv command has also been ported to the IBM i operating system. [12]
stdin
can be converted from ISO-8859-1 to current locale and output to stdout
using: [13]
iconv-fiso-8859-1
An input file infile
can be converted from ISO-8859-1 to UTF-8 and output to output file outfile
using:
iconv-fiso-8859-1-tutf-8<infile>-o<outfile>
Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical values that make up a character encoding are known as code points and collectively comprise a code space, a code page, or character map.
Cygwin is a free and open-source Unix-like environment and command-line interface (CLI) for Microsoft Windows. The project also provides a software repository containing many open-source packages. Cygwin allows source code for Unix-like operating systems to be compiled and run on Windows. Cygwin provides native integration of Windows-based applications.
In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own, such as devices that use magnetic tape. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.
In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte.
Windows-1252 or CP-1252 is a legacy single-byte character encoding that is used by default in Microsoft Windows throughout the Americas, Western Europe, Oceania, and much of Africa.
A newline is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a sequence of characters, is used to signify the end of a line of text and the start of a new one.
ISO/IEC 8859-2:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 2: Latin alphabet No. 2, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as "Latin-2". It is generally intended for Central or "Eastern European" languages that are written in the Latin script. Note that ISO/IEC 8859-2 is very different from code page 852 which is also referred to as "Latin-2" in Czech and Slovak regions. Almost half the use of the encoding is for Polish, and it's the main legacy encoding for Polish, while virtually all use of it has been replaced by UTF-8.
The computer tool patch is a Unix program that updates text files according to instructions contained in a separate file, called a patch file. The patch file is a text file that consists of a list of differences and is produced by running the related diff program with the original and updated file as arguments. Updating files with patch is often referred to as applying the patch or simply patching the files.
GB 18030 is a Chinese government standard, described as Information Technology — Chinese coded character set and defines the required language and character support necessary for software in China. GB18030 is the registered Internet name for the official character set of the People's Republic of China (PRC) superseding GB2312. As a Unicode Transformation Format, GB18030 supports both simplified and traditional Chinese characters. It is also compatible with legacy encodings including GB/T 2312, CP936, and GBK 1.0.
xargs is a command on Unix and most Unix-like operating systems used to build and execute commands from standard input. It converts input from standard input into arguments to a command.
dd is a command-line utility for Unix, Plan 9, Inferno, and Unix-like operating systems and beyond, the primary purpose of which is to convert and copy files. On Unix, device drivers for hardware and special device files appear in the file system just like normal files; dd can also read and/or write from/to these files, provided that function is implemented in their respective driver. As a result, dd can be used for tasks such as backing up the boot sector of a hard drive, and obtaining a fixed amount of random data. The dd program can also perform conversions on the data as it is copied, including byte order swapping and conversion to and from the ASCII and EBCDIC text encodings.
In computing, uconv is a command-line tool that is bundled with International Components for Unicode that converts text files between different character encodings. It is very similar to the iconv command that is part of the Single UNIX Specification which is usually implemented using libiconv. In fact the command line options for transcoding are the same. The command uconv can also convert to and from various Unicode normalization forms.
The Standard Compression Scheme for Unicode (SCSU) is a Unicode Technical Standard for reducing the number of bytes needed to represent Unicode text, especially if that text uses mostly characters from one or a small number of per-language character blocks. It does so by dynamically mapping values in the range 128–255 to offsets within particular blocks of 128 characters. The initial conditions of the encoder mean that existing strings in ASCII and ISO-8859-1 that do not contain C0 control codes other than NULL TAB CR and LF can be treated as SCSU strings. Since most alphabets do reside in blocks of contiguous Unicode codepoints, texts that use small alphabets and either ASCII punctuation or punctuation that fits within the window for the main alphabet can be encoded at one byte per character, most other punctuation can be encoded at 2 bytes per symbol through non-locking shifts. SCSU can also switch to UTF-16 internally to handle non-alphabetic languages.
tr is a command in Unix, Plan 9, Inferno, and Unix-like operating systems. It is an abbreviation of translate or transliterate, indicating its operation of replacing or removing specific characters in its input data set.
In computing, kill
is a command that is used in several popular operating systems to send signals to running processes.
In computing, cmp
is a command-line utility on Unix and Unix-like operating systems that compares two files of any type and writes the results to the standard output. By default, cmp
is silent if the files are the same; if they differ, the byte and line number at which the first difference occurred is reported. The command is also available in the OS-9 shell.
This article provides basic comparisons for notable text editors. More feature details for text editors are available from the Category of text editor features and from the individual products' articles. This article may not be up-to-date or necessarily all-inclusive.
Windows code pages are sets of characters or code pages used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Windows, although they are still supported both within Windows and other platforms, and still apply when Alt code shortcuts are used.
sum is a legacy utility available on some Unix and Unix-like operating systems. This utility outputs a 16-bit checksum of each argument file, as well as the number of blocks they take on disk. Two different checksum algorithms are in use. POSIX abandoned sum
in favor of cksum.
luit is a utility program used to translate the character set of a computer program so that its output can be displayed correctly on a terminal emulator that uses a different character set. Whereas iconv converts the character set of strings or text files at rest, luit converts the input and output of programs running interactively.