Tsort

Last updated
tsort
Initial release1979;42 years ago (1979)
Operating system Unix, Unix-like, V, Inferno
Platform Cross-platform
Type Command

The tsort program is a command line utility on Unix and Unix-like platforms, that performs a topological sort on its input. As of 2017, it is part of the POSIX.1 standard. [1]

Contents

History

According to its info [2] page, this command was initially written for providing an ordering of object files that allowed the linker to process them sequentially (each one exactly once, and in order). The FreeBSD manual page dates its appearance to Version 7 Unix. [3]

Note that the following description is describing the behaviour of the FreeBSD implementation of tsort and mentions GNU features where they may exist. Other implementations or versions may differ.

Syntax

tsort [-dlq] [FILE]

FreeBSD options can be:

-d         turn on debugging -l         search for and display the longest cycle. -q         Do not display informational messages about cycles.

GNU provides the following options only:

--help     display help message and exit --version  display version information and exit

There are no options prescribed by POSIX.

Behavior

tsort reads its input (from the given FILE, or standard input if no input file is given or for a FILE of '-') as pairs of strings, separated by blanks, indicating a partial ordering. The output is a total ordering that corresponds to the given partial ordering. [4]

In other words: for a directed acyclic graph (used as a dependency graph), tsort produces a listing of the vertices so that for all edges 'a->b', 'a' comes before 'b' in the listing.

Examples

tsort lists the vertices of a directed acyclic graph in such an order that all ordering/direction relations are respected:

$ tsort <<EOF > 3 8> 3 10> 5 11> 7 8> 7 11> 8 9> 11 2> 11 9> 11 10> EOF3571181029
sample DAG Directed acyclic graph 2.svg
sample DAG

Call graph

tsort can help rearranging functions in a source file so that as many as possible are defined before they are used (Interpret the following as: main() calls parse_options(), tail_file() and tail_forever(); tail_file() calls pretty_name(), and so on. The result is that dump_remainder() should be defined first, start_lines() second, etc.):

$ cat call-graph main parse_optionsmain tail_filemain tail_forevertail_file pretty_nametail_file write_headertail_file tailtail_forever rechecktail_forever pretty_nametail_forever write_headertail_forever dump_remaindertail tail_linestail tail_bytestail_lines start_linestail_lines dump_remaindertail_lines file_linestail_lines pipe_linestail_bytes xlseektail_bytes start_bytestail_bytes dump_remaindertail_bytes pipe_bytesfile_lines dump_remainderrecheck pretty_name
$ # note: 'tac' reverses the order$ tsort call-graph | tac dump_remainderstart_linesfile_linespipe_linesxlseekstart_bytespipe_bytestail_linestail_bytespretty_namewrite_headertailrecheckparse_optionstail_filetail_forevermain

Library

The traditional ld (Unix linker) requires that its library inputs be sorted in topological order, since it processes files in a single pass. This applies both to static libraries (*.a) and dynamic libraries (*.so), and in the case of static libraries preferably for the individual object files contained within. [5]

BSD UNIX uses tsort as a common part of the typical ar & ranlib command invocations (from /usr/share/mk/bsd.lib.mk):

lib${LIB}.a: ${OBJS} ${STATICOBJS}     @${ECHO} building static ${LIB} library     @${AR} cq ${.TARGET}`lorder ${OBJS}${STATICOBJS}| tsort -q`${ARADD}${RANLIB}${.TARGET}

Here lorder ("library order") is used to generate the inter-file dependency list by inspecting the symbol table.

Usage notes

Notice the interchangeability of white space separators so the following inputs are equivalent:

a b b c
a b b c
a b b c
a b b c
a b b c

Pairs of identical items indicate presence of a vertex, but not ordering (so the following represents one vertex without edges):

a a

Strictly speaking there is no topological ordering of a graph that contains one or more cycles. However tsort prints a warning and GNU tsort prints the detected cycles to standard error (lines beginning with 'tsort:'):

$ tsort <<EOF > a b> b c> c a> EOFUX: tsort: INFORM: cycle in datatsort: atsort: btsort: cabc

See also

Related Research Articles

uniq is a utility command on Unix, Plan 9, Inferno, and Unix-like operating systems which, when fed a text file or standard input, outputs the text with adjacent identical lines collapsed to one, unique line of text.

ls

In computing, ls is a command to list computer files in Unix and Unix-like operating systems. ls is specified by POSIX and the Single UNIX Specification. When invoked without any arguments, ls lists the files in the current working directory. The command is also available in the EFI shell. In other environments, such as DOS, OS/2, and Microsoft Windows, similar functionality is provided by the dir command. The numerical computing environments MATLAB and GNU Octave include an ls function with similar functionality.

In software development, Make is a build automation tool that automatically builds executable programs and libraries from source code by reading files called Makefiles which specify how to derive the target program. Though integrated development environments and language-specific compiler features can also be used to manage a build process, Make remains widely used, especially in Unix and Unix-like operating systems.

The comm command in the Unix family of computer operating systems is a utility that is used to compare two files for common and distinct lines. comm is specified in the POSIX standard. It has been widely available on Unix-like operating systems since the mid to late 1980s.

The archiver, also known simply as ar, is a Unix utility that maintains groups of files as a single archive file. Today, ar is generally used only to create and update static library files that the link editor or linker uses and for generating .deb packages for the Debian family; it can be used to create archives for any purpose, but has been largely replaced by tar for purposes other than static libraries. An implementation of ar is included as one of the GNU Binutils.

xargs is a command on Unix and most Unix-like operating systems used to build and execute commands from standard input. It converts input from standard input into arguments to a command.

dd is a command-line utility for Unix and Unix-like operating systems, the primary purpose of which is to convert and copy files.

join is a command in Unix and Unix-like operating systems that merges the lines of two sorted text files based on the presence of a common field. It is similar to the join operator used in relational databases but operating on text files.

wc (Unix)

wc is a command in Unix, Plan 9, Inferno, and Unix-like operating systems. The program reads either standard input or a list of computer files and generates one or more of the following statistics: newline count, word count, and byte count. If a list of files is provided, both individual file and total statistics follow.

pax (command)

pax is an archiving utility available for various operating systems and defined since 1995. Rather than sort out the incompatible options that have crept up between tar and cpio, along with their implementations across various versions of Unix, the IEEE designed a new archive utility that could support various archive formats with useful options from both archivers. The pax command is available on Unix and Unix-like operating systems and on IBM i, Microsoft Windows NT, and Windows 2000.

In computing, cp is a command in various Unix and Unix-like operating systems for copying files and directories. The command has three principal modes of operation, expressed by the types of arguments presented to the program for copying a file to another file, one or more files to a directory, or for copying entire directories to another directory.

In computing, cut is a command line utility on Unix and Unix-like operating systems which is used to extract sections from each line of input — usually from a file. It is currently part of the GNU coreutils package and the BSD Base System.

split is a utility on Unix, Plan 9, and Unix-like operating systems most commonly used to split a computer file into two or more smaller files.

The GNU Core Utilities or coreutils is a package of GNU software containing implementations for many of the basic tools, such as cat, ls, and rm, which are used on Unix-like operating systems.

who (Unix)

The standard Unix command who displays a list of users who are currently logged into the computer.

The seven standard Unix file types are regular, directory, symbolic link, FIFO special, block special, character special, and socket as defined by POSIX. Different OS-specific implementations allow more types than what POSIX requires. A file's type can be identified by the ls -l command, which displays the type in the first character of the file-system permissions field.

In mathematics, computer science and digital electronics, a dependency graph is a directed graph representing dependencies of several objects towards each other. It is possible to derive an evaluation order or the absence of an evaluation order that respects the given dependencies from the dependency graph.

sum is a legacy utility available on some Unix and Unix-like operating systems. This utility outputs the checksum of each argument file, as well as the number of blocks they take on disk.

In Unix and Unix-like operating systems, printf is a shell builtin that formats and prints data.

cat (Unix)

cat is a standard Unix utility that reads files sequentially, writing them to standard output. The name is derived from its function to concatenate files. It has been ported to a number of operating systems.

References

  1. "tsort". The Open Group Base Specifications Issue 7, 2018 edition. The Open Group.
  2. "Tsort background (GNU Coreutils 9.0)".
  3. "Tsort".
  4. "Tsort invocation (GNU Coreutils 9.0)".
  5. "c++ - gcc ld: method to determine link order of static libraries". Stack Overflow.

Further reading

manual page of tsort on