Iconv

Last updated

iconv
Original author(s) Hewlett-Packard
Developer(s) Various open-source and commercial developers
Repository https://git.savannah.gnu.org/git/libiconv.git
Operating system Unix, Unix-like, Microsoft Windows, IBM i
Platform Cross-platform
Type Command
License libiconv: LGPL
iconv: GPL
win-iconv: Public domain [1]

In Unix and Unix-like operating systems, iconv (an abbreviation of internationalization conversion) [2] is a command-line program [3] and a standardized application programming interface (API) [4] used to convert between different character encodings. "It can convert from any of these encodings to any other, through Unicode conversion." [5]

Contents

History

Initially appearing on the HP-UX operating system, [6] iconv() as well as the utility was standardized within XPG4 and is part of the Single UNIX Specification (SUS).

Implementations

Most Linux distributions provide an implementation, either from the GNU Standard C Library (included since version 2.1, February 1999), or the more traditional GNU libiconv, for systems based on other Standard C Libraries.

The iconv function [7] on both is licensed as LGPL, so it is linkable with closed source applications.

Unlike the libraries, the iconv utility is licensed under GPL in both implementations. [8] The GNU libiconv implementation is portable, and can be used on various UNIX-like and non-UNIX systems. Version 0.3 dates from December 1999.

The uconv utility from International Components for Unicode provides an iconv-compatible command-line syntax for transcoding.

Most BSD systems use NetBSD's implementation, which first appeared in December 2004.

The musl C library implements the iconv function with support for all encodings specified by the WHATWG Encoding Standard.

Support

Currently, over a hundred different character encodings are supported in the GNU variant. [5]

Ports

Under Microsoft Windows, the iconv library and the utility is provided by GNU's libiconv found in Cygwin [9] and GnuWin32 [10] environments; there is also a "purely Win32" implementation called "win-iconv" that uses Windows' built-in routines for conversion. [11] The iconv function is also available for many programming languages.

The iconv command has also been ported to the IBM i operating system. [12]

Usage

stdin can be converted from ISO-8859-1 to current locale and output to stdout using: [13]

iconv-fiso-8859-1 

An input file infile can be converted from ISO-8859-1 to UTF-8 and output to output file outfile using:

iconv-fiso-8859-1-tutf-8<infile>-o<outfile> 

See also

Related Research Articles

<span class="mw-page-title-main">Character encoding</span> Using numbers to represent text characters

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical values that make up a character encoding are known as code points and collectively comprise a code space, a code page, or character map.

<span class="mw-page-title-main">Cygwin</span> Unix-like environment for Windows

Cygwin is a free and open-source Unix-like environment and command-line interface (CLI) for Microsoft Windows. The project also provides a software repository containing open-source packages. Cygwin allows source code for Unix-like operating systems to be compiled and run on Windows. Cygwin provides native integration of Windows-based applications.

<span class="mw-page-title-main">BitchX</span> Free IRC client

BitchX is a free IRC client that has been regarded as the most popular ircII-based IRC client. The initial implementation, written by "Trench" and "HappyCrappy", was a script for the IrcII chat client. It was converted to a program in its own right by panasync. BitchX 1.1 final was released in 2004. It is written in C and is a TUI application utilizing ncurses. GTK+ toolkit support has been dropped. It works on all Unix-like operating systems, and is distributed under a BSD license. It was originally based on ircII-EPIC, and eventually it was merged into the EPIC IRC client. It supports IPv6, multiple servers and SSL, and a subset of UTF-8 with an unofficial patch.

The Portable Operating System Interface is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines application programming interfaces (APIs), along with command line shells and utility interfaces, for software compatibility (portability) with variants of Unix and other operating systems. POSIX is also a trademark of the IEEE. POSIX is intended to be used by both application and system developers.

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own, such as devices that use magnetic tape. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.

In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte.

<span class="mw-page-title-main">Windows-1252</span> Windows character set for Latin alphabet

Windows-1252 or CP-1252 is a legacy single-byte character encoding that is used by default in Microsoft Windows throughout the Americas, Western Europe, Oceania, and much of Africa.

<span class="mw-page-title-main">Newline</span> Special characters in computing signifying the end of a line of text

A newline is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a sequence of characters, is used to signify the end of a line of text and the start of a new one.

ln (Unix) Unix file management utility

The ln command is a standard Unix command utility used to create a hard link or a symbolic link (symlink) to an existing file or directory. The use of a hard link allows multiple filenames to be associated with the same file since a hard link points to the inode of a given file, the data of which is stored on disk. On the other hand, symbolic links are special files that refer to other files by name.

The GNU C Library, commonly known as glibc, is the GNU Project implementation of the C standard library. It provides a wrapper around the system calls of the Linux kernel and other kernels for application use. Despite its name, it now also directly supports C++. It was started in the 1980s by the Free Software Foundation (FSF) for the GNU operating system.

<span class="mw-page-title-main">GB 18030</span> Official Chinese character encoding

GB 18030 is a Chinese government standard, described as Information Technology — Chinese coded character set and defines the required language and character support necessary for software in China. GB18030 is the registered Internet name for the official character set of the People's Republic of China (PRC) superseding GB2312. As a Unicode Transformation Format, GB18030 supports both simplified and traditional Chinese characters. It is also compatible with legacy encodings including GB/T 2312, CP936, and GBK 1.0.

In computing, uconv is a command-line tool that is bundled with International Components for Unicode that converts text files between different character encodings. It is very similar to the iconv command that is part of the Single UNIX Specification which is usually implemented using libiconv. In fact the command line options for transcoding are the same. The command uconv can also convert to and from various Unicode normalization forms.

The GNU toolchain is a broad collection of programming tools produced by the GNU Project. These tools form a toolchain used for developing software applications and operating systems.

The Standard Compression Scheme for Unicode (SCSU) is a Unicode Technical Standard for reducing the number of bytes needed to represent Unicode text, especially if that text uses mostly characters from one or a small number of per-language character blocks. It does so by dynamically mapping values in the range 128–255 to offsets within particular blocks of 128 characters. The initial conditions of the encoder mean that existing strings in ASCII and ISO-8859-1 that do not contain C0 control codes other than NULL TAB CR and LF can be treated as SCSU strings. Since most alphabets do reside in blocks of contiguous Unicode codepoints, texts that use small alphabets and either ASCII punctuation or punctuation that fits within the window for the main alphabet can be encoded at one byte per character, most other punctuation can be encoded at 2 bytes per symbol through non-locking shifts. SCSU can also switch to UTF-16 internally to handle non-alphabetic languages.

In computing, cmp is a command-line utility on Unix and Unix-like operating systems that compares two files of any type and writes the results to the standard output. By default, cmp is silent if the files are the same; if they differ, the byte and line number at which the first difference occurred is reported. The command is also available in the OS-9 shell.

This article provides basic comparisons for notable text editors. More feature details for text editors are available from the Category of text editor features and from the individual products' articles. This article may not be up-to-date or necessarily all-inclusive.

Windows code pages are sets of characters or code pages used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Windows, although they are still supported both within Windows and other platforms, and still apply when Alt code shortcuts are used.

whoami Command on various operating systems

In computing, whoami is a command found on most Unix-like operating systems, Intel iRMX 86, every Microsoft Windows operating system since Windows Server 2003, and on ReactOS. It is a concatenation of the words "Who am I?" and prints the effective username of the current user when invoked.

sum is a legacy utility available on some Unix and Unix-like operating systems. This utility outputs a 16-bit checksum of each argument file, as well as the number of blocks they take on disk. Two different checksum algorithms are in use. POSIX abandoned sum in favor of cksum.

luit

luit is a utility program used to translate the character set of a computer program so that its output can be displayed correctly on a terminal emulator that uses a different character set. Whereas iconv converts the character set of strings or text files at rest, luit converts the input and output of programs running interactively.

References

  1. "win-iconv/readme.txt at master · win-iconv/win-iconv · GitHub". GitHub .
  2. "R: Convert Character Vector between Encodings". astrostatistics.psu.edu. Retrieved 21 April 2018.
  3. "iconv". pubs.opengroup.org. Retrieved 21 April 2018.
  4. "iconv". www.opengroup.org. Retrieved 21 April 2018.
  5. 1 2 "libiconv - GNU Project - Free Software Foundation (FSF)". www.gnu.org. Retrieved 21 April 2018.
  6. "iconv(3C)". docstore.mik.ua. Retrieved 21 April 2018.
  7. "glibc: iconv/iconv.c" . Retrieved 30 November 2016.[ permanent dead link ]
  8. "glibc: iconv/iconv_prog.c" . Retrieved 30 November 2016.[ permanent dead link ]
  9. "Cygwin Package Search: libiconv". Archived from the original on 30 November 2016. Retrieved 30 November 2016.
  10. "LibIconv for Windows". gnuwin32.sourceforge.net. Retrieved 21 April 2018.
  11. "win32-iconv". GitHub. Retrieved 30 November 2016.
  12. IBM. "IBM System i Version 7.2 Programming Qshell" (PDF). IBM . Retrieved 5 September 2020.
  13. "IBM Knowledge Center". www-01.ibm.com. Retrieved 21 April 2018.