This article needs additional citations for verification .(November 2020) |
Original author(s) | John Birchfield |
---|---|
Developer(s) | Benjamin Lin, Bernd Johannes Wuebben, Christian Wurll, Erwin Waterlander |
Initial release | 1989 |
Stable release | |
Repository | |
Operating system | Unix-like, DOS, OS/2, Windows |
Platform | Cross-platform |
Type | Command |
License | FreeBSD style license |
Website | waterlan |
unix2dos
(sometimes named todos
or u2d
) is a tool to convert line breaks in a text file from Unix format (Line feed) to DOS format (carriage return + Line feed) and vice versa. When invoked as unix2dos
the program will convert a Unix text file to DOS format, when invoked as dos2unix
it will convert a DOS text file to Unix format. [2]
Unix2dos and dos2unix are not part of the Unix standard. Commercial Unixes usually come with their own implementation of unix2dos/dos2unix, like SunOS/Solaris's dos2unix/unix2dos, HP-UX's dos2ux/ux2dos and Irix's to_unix/to_dos.
There exist many open source alternatives with different command names and options like dos2unix/unix2dos, d2u/u2d, fromdos/todos, endlines, flip. The multi-call binary busybox includes an implementation of unix2dos/dos2unix.
See the manual page of the respective commands.
$ recodelatin1..dosfile
$ perl-i-p-e's|[\r\n]+|\r\n|g'file
$ sed-i-n-z's/\n/\r\n/g;p'file
For the opposite conversion ( dos2unix ) it is possible to use, for example, the utility tr with the -d '\r'
flag to remove the carriage return characters:
$ tr-d'\r'<file>file2# For ASCII and other files which do not contain multibyte characters (Not utf-8 safe).
$ perl-i-p-e's/\r//g'file
$ sed-i-e's/\r//g'file
Note: The above method assumes there are only DOS line breaks in the input file. Any Mac line breaks (\r) present in the input will be removed.
An alternative to the dos2unix conversion is possible by using the col
command that is available on Linux and other Unix-like operating systems, including Mac OS X. In the following case, InFile contains the undesired DOS (^M) line endings. After execution, OutFile is either created or replaced, and contains UNIX line endings. The -b
option tells col
not to output backspace characters.
$ col-b<InFile>OutFile
AWK is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems.
sed is a Unix utility that parses and transforms text, using a simple, compact programming language. It was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs, and is available today for most operating systems. sed was based on the scripting features of the interactive editor ed and the earlier qed. It was one of the earliest tools to support regular expressions, and remains in use for text processing, most notably with the substitution command. Popular alternative tools for plaintext string manipulation and "stream editing" include AWK and Perl.
A shell script is a computer program designed to be run by a Unix shell, a command-line interpreter. The various dialects of shell scripts are considered to be command languages. Typical operations performed by shell scripts include file manipulation, program execution, and printing text. A script which sets up the environment, runs the program, and does any necessary cleanup or logging, is called a wrapper.
The comm command in the Unix family of computer operating systems is a utility that is used to compare two files for common and distinct lines. comm is specified in the POSIX standard. It has been widely available on Unix-like operating systems since the mid to late 1980s.
A newline is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a sequence of characters, is used to signify the end of a line of text and the start of a new one.
uuencoding is a form of binary-to-text encoding that originated in the Unix programs uuencode and uudecode written by Mary Ann Horton at the University of California, Berkeley in 1980, for encoding binary data for transmission in email systems.
The archiver, also known simply as ar, is a Unix utility that maintains groups of files as a single archive file. Today, ar
is generally used only to create and update static library files that the link editor or linker uses and for generating .deb packages for the Debian family; it can be used to create archives for any purpose, but has been largely replaced by tar
for purposes other than static libraries. An implementation of ar
is included as one of the GNU Binutils.
xargs is a command on Unix and most Unix-like operating systems used to build and execute commands from standard input. It converts input from standard input into arguments to a command.
dc is a cross-platform reverse-Polish calculator which supports arbitrary-precision arithmetic. It was written by Lorinda Cherry and Robert Morris at Bell Labs. It is one of the oldest Unix utilities, preceding even the invention of the C programming language. Like other utilities of that vintage, it has a powerful set of features but terse syntax. Traditionally, the bc calculator program was implemented on top of dc.
tr is a command in Unix, Plan 9, Inferno, and Unix-like operating systems. It is an abbreviation of translate or transliterate, indicating its operation of replacing or removing specific characters in its input data set.
wc
is a command in Unix, Plan 9, Inferno, and Unix-like operating systems. The program reads either standard input or a list of computer files and generates one or more of the following statistics: newline count, word count, and byte count. If a list of files is provided, both individual file and total statistics follow.
paste is a Unix command line utility which is used to join files horizontally by outputting lines consisting of the sequentially corresponding lines of each file specified, separated by tabs, to the standard output.
In Unix-like computer operating systems, a pipeline is a mechanism for inter-process communication using message passing. A pipeline is a set of processes chained together by their standard streams, so that the output text of each process (stdout) is passed directly as input (stdin) to the next one. The second process is started as the first process is still executing, and they are executed concurrently.
In computing, a here document is a file literal or input stream literal: it is a section of a source code file that is treated as if it were a separate file. The term is also used for a form of multiline string literals that use similar syntax, preserving line breaks and other whitespace in the text.
In computing, more
is a command to view the contents of a text file one screen at a time. It is available on Unix and Unix-like systems, DOS, Digital Research FlexOS, IBM/Toshiba 4690 OS, IBM OS/2, Microsoft Windows and ReactOS. Programs of this sort are called pagers. more
is a very basic pager, originally allowing only forward navigation through a file, though newer implementations do allow for limited backward movement.
tail is a program available on Unix, Unix-like systems, FreeDOS and MSX-DOS used to display the tail end of a text file or piped data.
In computing, a shebang is the character sequence #!, consisting of the characters number sign and exclamation mark, at the beginning of a script. It is also called sharp-exclamation, sha-bang, hashbang, pound-bang, or hash-pling.
In computing, sort is a standard command line program of Unix and Unix-like operating systems, that prints the lines of its input or concatenation of all files listed in its argument list in sorted order. Sorting is done based on one or more sort keys extracted from each line of input. By default, the entire input is taken as sort key. Blank space is the default field separator. The command supports a number of command-line options that can vary by implementation. For instance the "-r
" flag will reverse the sort order.
Toybox is a free and open-source software implementation of over 200 Unix command line utilities such as ls, cp, and mv. The Toybox project was started in 2006, and became a 0BSD licensed BusyBox alternative. Toybox is used for most of Android's command-line tools in all currently supported Android versions, and is also used to build Android on Linux and macOS. All of the tools are tested on Linux, and many of them also work on BSD and macOS.
A command-line interface (CLI) is a means of interacting with a computer program by inputting lines of text called command-lines. Command-line interfaces emerged in the mid-1960s, on computer terminals, as an interactive and more user-friendly alternative to the non-interactive interface available with punched cards.