Ar (Unix)

Last updated
ar
Original author(s) Ken Thompson,
Dennis Ritchie
(AT&T Bell Laboratories)
Developer(s) Various open-source and commercial developers
Initial releaseNovember 3, 1971;51 years ago (1971-11-03)
Written in C
Operating system Unix, Unix-like, V, Plan 9, Inferno
Platform Cross-platform
Type Command
License Plan 9: MIT License
archiver format
Filename extension
.a, .lib, .ar [1]
Internet media type
application/x-archive [1]
Magic number !<arch>
Type of format archive format
Container for usually object files (.o)
Standard Not standardized, several variants exist
Open format?Yes [2]

The archiver, also known simply as ar, is a Unix utility that maintains groups of files as a single archive file. Today, ar is generally used only to create and update static library files that the link editor or linker uses and for generating .deb packages for the Debian family; it can be used to create archives for any purpose, but has been largely replaced by tar for purposes other than static libraries. [3] An implementation of ar is included as one of the GNU Binutils. [2]

Contents

In the Linux Standard Base (LSB), ar has been deprecated and is expected to disappear in a future release of that standard. The rationale provided was that "the LSB does not include software development utilities nor does it specify .o and .a file formats." [4]

File format details

Diagram showing an example file structure of a .deb file Deb File Structure.svg
Diagram showing an example file structure of a .deb file

The ar format has never been standardized; modern archives are based on a common format with two main variants, BSD and System V (initially known as COFF, and used as well by GNU, ELF, and Windows.)

Historically there have been other variants [5] including V6, V7, AIX (small and big), and Coherent, which all vary significantly from the common format. [6]

Debian ".deb" archives use the common format.

An ar file begins with a global header, followed by a header and data section for each file stored within the ar file.

Each data section is 2 byte aligned. If it would end on an odd offset, a newline ('\n', 0x0A) is used as filler.

File signature

The file signature is a single field containing the magic ASCII string "!<arch>" followed by a single LF control character (0x0A).

File header

Each file stored in an ar archive includes a file header to store information about the file. The common format is as follows. Numeric values are encoded in ASCII and all values right-padded with ASCII spaces (0x20).

OffsetLengthNameFormat
016File identifierASCII
1612File modification timestamp (in seconds)Decimal
286Owner IDDecimal
346Group IDDecimal
408 File mode (type and permission)Octal
4810File size in bytesDecimal
582Ending characters0x60 0x0A

As the headers only include printable ASCII characters and line feeds, an archive containing only text files therefore still appears to be a text file itself.

The members are aligned to even byte boundaries. "Each archive file member begins on an even byte boundary; a newline is inserted between files if necessary. Nevertheless, the size given reflects the actual size of the file exclusive of padding." [7]

Due to the limitations of file name length and format, both the GNU and BSD variants devised different methods of storing long filenames. Although the common format does not suffer from the year 2038 problem, many implementations of the ar utility do and may need to be modified in the future to handle correctly timestamps in excess of 2147483647. A description of these extensions is found in libbfd. [8]

Depending on the format, many ar implementations include a global symbol table (aka armap, directory or index) for fast linking without needing to scan the whole archive for a symbol. POSIX recognizes this feature, and requires ar implementations to have an -s option for updating it. Most implementations put it at the first file entry. [9]

BSD variant

BSD ar stores filenames right-padded with ASCII spaces. This causes issues with spaces inside filenames. 4.4BSD ar stores extended filenames by placing the string "#1/" followed by the file name length in the file name field, and storing the real filename in front of the data section. [6]

BSD ar utility traditionally does not handle the building of a global symbol lookup table, and delegates this task to a separate utility named ranlib, [10] which inserts an architecture-specific file named __.SYMDEF as first archive member. [11] Some descendents put a space and "SORTED" after the name to indicate a sorted version. [12] A 64-bit variant called __.SYMDEF_64 exists on Darwin.

Since POSIX added the requirement for the -s option as an replacement of ranlib, however, newer BSD ar implementations have been rewritten to have this feature. FreeBSD in particular ditched the SYMDEF table format and embraced the System V style table. [13]

System V (or GNU) variant

System V ar uses a '/' character (0x2F) to mark the end of the filename; this allows for the use of spaces without the use of an extended filename. Then it stores multiple extended filenames in the data section of a file with the name "//", this record is referred to by future headers. A header references an extended filename by storing a "/" followed by a decimal offset to the start of the filename in the extended filename data section. The format of this "//" file itself is simply a list of the long filenames, each separated by one or more LF characters. Note that the decimal offsets are number of characters, not line or string number within the "//" file. This is usually the second entry of the file, after the symbol table which always is the first.

System V ar uses the special filename "/" to denote that the following data entry contains a symbol lookup table, which is used in ar libraries to speed up access. This symbol table is built in three parts which are recorded together as contiguous data.

  1. A 32-bit big endian integer, giving the number of entries in the table.
  2. A set of 32-bit big endian integers. One for each symbol, recording the position within the archive of the header for the file containing this symbol.
  3. A set of Zero-terminated strings. Each is a symbol name, and occurs in the same order as the list of positions in part 2.

Some System V systems do not use the format described above for the symbol lookup table. For operating systems such as HP-UX 11.0, this information is stored in a data structure based on the SOM file format.

The special file "/" is not terminated with a specific sequence; the end is assumed once the last symbol name has been read.

To overcome the 4 GiB file size limit some operating system like Solaris 11.2 and GNU use a variant lookup table. Instead of 32-bit integers, 64-bit integers are used in the symbol lookup tables. The string "/SYM64/" instead "/" is used as identifier for this table [14]

Windows variant

The Windows (PE/COFF) variant is based on the SysV/GNU variant. The first entry "/" has the same layout as the SysV/GNU symbol table. The second entry is another "/", a Microsoft extension that stores an extended symbol cross-reference table. This one is sorted and uses little-endian integers. [5] [15] The third entry is the optional "//" long name data as in SysV/GNU. [16]

Thin archive

The version of ar in GNU binutils and Elfutils have an additional "thin archive" format with the magic number !<thin>. A thin archive only contains a symbol table and references to the file. The file format is essentially a System V format archive where every file is stored without the data sections. Every filename is stored as a "long" filename and they are to be resolved as if they were symbolic links. [17]

Example usage

To create an archive from files class1.o, class2.o, class3.o, the following command would be used:

ar rcs libclass.a class1.o class2.o class3.o

Unix linkers, usually invoked through the C compiler cc, can read ar files and extract object files from them, so if libclass.a is an archive containing class1.o, class2.o and class3.o, then

cc main.c libclass.a

or (if libclass.a is placed in standard library path, like /usr/local/lib)

cc main.c -lclass

or (during linking)

ld ... main.o -lclass ...

is the same as:

cc main.c class1.o class2.o class3.o

See also

Related Research Articles

<span class="mw-page-title-main">Executable and Linkable Format</span> Standard file format for executables, object code, shared libraries, and core dumps.

In computing, the Executable and Linkable Format, is a common standard file format for executable files, object code, shared libraries, and core dumps. First published in the specification for the application binary interface (ABI) of the Unix operating system version named System V Release 4 (SVR4), and later in the Tool Interface Standard, it was quickly accepted among different vendors of Unix systems. In 1999, it was chosen as the standard binary file format for Unix and Unix-like systems on x86 processors by the 86open project.

gzip GNU file compression/decompression tool

gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and intended for use by GNU. Version 0.1 was first publicly released on 31 October 1992, and version 1.0 followed in February 1993.

In computing, a core dump, memory dump, crash dump, storage dump, system dump, or ABEND dump consists of the recorded state of the working memory of a computer program at a specific time, generally when the program has crashed or otherwise terminated abnormally. In practice, other key pieces of program state are usually dumped at the same time, including the processor registers, which may include the program counter and stack pointer, memory management information, and other processor and operating system flags and information. A snapshot dump is a memory dump requested by the computer operator or by the running program, after which the program is able to continue. Core dumps are often used to assist in diagnosing and debugging errors in computer programs.

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.

In the Unix operating system, shar is an archive format created with the Unix shar utility. A shar file is a type of self-extracting archive, because it is a valid shell script, and executing it will recreate the files. To extract the files, only the standard Unix Bourne shell sh is usually required.

ln (Unix) Unix file management utility

The ln command is a standard Unix command utility used to create a hard link or a symbolic link (symlink) to an existing file or directory. The use of a hard link allows multiple filenames to be associated with the same file since a hard link points to the inode of a given file, the data of which is stored on disk. On the other hand, symbolic links are special files that refer to other files by name.

errno.h is a header file in the standard library of the C programming language. It defines macros for reporting and retrieving error conditions using the symbol errno.

dd is a command-line utility for Unix, Plan 9, Inferno, and Unix-like operating systems and beyond, the primary purpose of which is to convert and copy files. On Unix, device drivers for hardware and special device files appear in the file system just like normal files; dd can also read and/or write from/to these files, provided that function is implemented in their respective driver. As a result, dd can be used for tasks such as backing up the boot sector of a hard drive, and obtaining a fixed amount of random data. The dd program can also perform conversions on the data as it is copied, including byte order swapping and conversion to and from the ASCII and EBCDIC text encodings.

Ctags is a programming tool that generates an index file of names found in source and header files of various programming languages to aid code comprehension. Depending on the language, functions, variables, class members, macros and so on may be indexed. These tags allow definitions to be quickly and easily located by a text editor, a code search engine, or other utility. Alternatively, there is also an output mode that generates a cross reference file, listing information about various names found in a set of language files in human-readable form.

wc (Unix) Unix command utility

wc is a command in Unix, Plan 9, Inferno, and Unix-like operating systems. The program reads either standard input or a list of computer files and generates one or more of the following statistics: newline count, word count, and byte count. If a list of files is provided, both individual file and total statistics follow.

pax is an archiving utility available for various operating systems and defined since 1995. Rather than sort out the incompatible options that have crept up between tar and cpio, along with their implementations across various versions of Unix, the IEEE designed new archive utility pax that could support various archive formats with useful options from both archivers. The pax command is available on Unix and Unix-like operating systems and on IBM i, and Microsoft Windows NT until Windows 2000.

who (Unix)

The standard Unix command who displays a list of users who are currently logged into the computer.

tail is a program available on Unix, Unix-like systems, FreeDOS and MSX-DOS used to display the tail end of a text file or piped data.

In the macOS, iOS, NeXTSTEP, and GNUstep programming frameworks, property list files are files that store serialized objects. Property list files use the filename extension .plist, and thus are often referred to as p-list files.

cpio is a general file archiver utility and its associated file format. It is primarily installed on Unix-like computer operating systems. The software utility was originally intended as a tape archiving program as part of the Programmer's Workbench (PWB/UNIX), and has been a component of virtually every Unix operating system released thereafter. Its name is derived from the phrase copy in and out, in close description of the program's use of standard input and standard output in its operation.

<span class="mw-page-title-main">GNU Binutils</span> Software development tools

The GNU Binary Utilities, or binutils, are a set of programming tools for creating and managing binary programs, object files, libraries, profile data, and assembly source code.

sar (Unix) Unix command to collect, report or save system activity information

System Activity Report (sar) is a Unix System V-derived system monitor command used to report on various system loads, including CPU activity, memory/paging, interrupts, device load, network and swap space utilization. Sar uses /proc filesystem for gathering information.

<span class="mw-page-title-main">Unix</span> Family of computer operating systems

Unix is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and others.

On Unix, Plan 9, and Unix-like computer systems, factor is a utility for factoring an integer into its prime factors.

cat (Unix) Unix command utility

cat is a standard Unix utility that reads files sequentially, writing them to standard output. The name is derived from its function to (con)catenate files. It has been ported to a number of operating systems.

References

  1. 1 2 "application/x-archive". Archived from the original on 2019-12-08. Retrieved 2019-03-11.
  2. 1 2 "ar(1) – Linux man page" . Retrieved 3 October 2013.
  3. "Static Libraries". TLDP. Retrieved 3 October 2013.
  4. Linux Standard Base Core Specification, version 4.1, Chapter 15. Commands and Utilities > ar
  5. 1 2 Levine, John R. (2000) [October 1999]. "Chapter 6: Libraries". Linkers and Loaders. The Morgan Kaufmann Series in Software Engineering and Programming (1 ed.). San Francisco, USA: Morgan Kaufmann. ISBN   1-55860-496-0. OCLC   42413382. Archived from the original on 2012-12-05. Retrieved 2020-01-12. Code: Errata:
  6. 1 2 Manual page for NET/2 ar file format
  7. "ar.h". www.unix.com. The UNIX and Linux Forums.
  8. "bminor/binutils-gdb: archive.c". GitHub. 16 July 2022.
  9. ar   Shell and Utilities Reference, The Single UNIX Specification , Version 4 from The Open Group
  10. Manual page for NET/2 ranlib utility
  11. Manual page for NET/2 ranlib file format
  12. "ranlib.h". opensource.apple.com.
  13. ar(5)    FreeBSD File Formats Manual
  14. "ar.h(3HEAD)". docs.oracle.com. Oracle Corporation. 11 November 2014. Retrieved 14 November 2018.
  15. Pietrek, Matt (April 1998), "Under The Hood", Microsoft Systems Journal, archived from the original on 2007-06-24, retrieved 2014-08-23
  16. "llvm-mirror/llvm: archive.cpp (format detection)". GitHub. Retrieved 10 February 2020.
  17. "ar". GNU Binary Utilities.