Library (computing)

Last updated

Illustration of an application which uses libvorbisfile to play an Ogg Vorbis file Ogg vorbis libs and application dia.svg
Illustration of an application which uses libvorbisfile to play an Ogg Vorbis file

In computer science, a library is a collection of resources that is leveraged during software development to implement a computer program.

Contents

Historically, a library consisted of subroutines (generally called functions today). The concept now includes other forms of executable code including classes and non-executable data including images and text. It can also refer to a collection of source code.

For example, a program could use a library to indirectly make system calls instead of making those system calls directly in the program.

Characteristics

General

A library can be used by multiple, independent consumers (programs and other libraries). This differs from resources defined in a program which can usually only be used by that program.

When a consumer uses a library resource, it gains the value of the library without having to implement it itself. Libraries encourage code reuse in a modular fashion.

When writing code that uses a library, a programmer only needs to know high-level information such as what items it contains at and how to use the items not all of the internal details of the library.

Libraries can use other libraries resulting in a hierarchy of libraries in a program.

Executable

A library of executable code has a well-defined interface by which the functionality is invoked. For example, in C, a library function is invoked via C's normal function call capability. The linker generates code to call a function via the library mechanism if the function is available from a library instead of from the program itself. [1]

The functions of a library can be connected to the invoking program at different program lifecycle phases. If the code of the library is accessed during the build of the invoking program, then the library is called a static library. [2] An alternative is to build the program executable to be separate from the library file. The library functions are connected after the executable is started, either at load-time or runtime. In this case, the library is called a dynamic library.

Most compiled languages have a standard library, although programmers can also create their own custom libraries. Most modern software systems provide libraries that implement the majority of the system services. Such libraries have organized the services which a modern application requires. As such, most code used by modern applications is provided in these system libraries.

History

The idea of a computer library dates back to the first computers created by Charles Babbage. An 1888 paper on his Analytical Engine suggested that computer operations could be punched on separate cards from numerical input. If these operation punch cards were saved for reuse then "by degrees the engine would have a library of its own." [3]

A woman working next to a filing cabinet containing the subroutine library on reels of punched tape for the EDSAC computer. FirstCodeLibrary-ESDAC-ThePreparationOfProgramsForAnElectronicDigitalComputer-1951.jpg
A woman working next to a filing cabinet containing the subroutine library on reels of punched tape for the EDSAC computer.

In 1947 Goldstine and von Neumann speculated that it would be useful to create a "library" of subroutines for their work on the IAS machine, an early computer that was not yet operational at that time. [4] They envisioned a physical library of magnetic wire recordings, with each wire storing reusable computer code. [5]

Inspired by von Neumann, Wilkes and his team constructed EDSAC. A filing cabinet of punched tape held the subroutine library for this computer. [6] Programs for EDSAC consisted of a main program and a sequence of subroutines copied from the subroutine library. [7] In 1951 the team published the first textbook on programming, The Preparation of Programs for an Electronic Digital Computer , which detailed the creation and the purpose of the library. [8]

COBOL included "primitive capabilities for a library system" in 1959, [9] but Jean Sammet described them as "inadequate library facilities" in retrospect. [10]

JOVIAL has a Communication Pool (COMPOOL), roughly a library of header files.

Another major contributor to the modern library concept came in the form of the subprogram innovation of FORTRAN. FORTRAN subprograms can be compiled independently of each other, but the compiler lacked a linker. So prior to the introduction of modules in Fortran-90, type checking between FORTRAN [NB 1] subprograms was impossible. [11]

By the mid 1960s, copy and macro libraries for assemblers were common. Starting with the popularity of the IBM System/360, libraries containing other types of text elements, e.g., system parameters, also became common.

In IBM's OS/360 and its successors this is called a partitioned data set.

The first object-oriented programming language, Simula, developed in 1965, supported adding classes to libraries via its compiler. [12] [13]

Linking

Libraries are important in the program linking or binding process, which resolves references known as links or symbols to library modules. The linking process is usually automatically done by a linker or binder program that searches a set of libraries and other modules in a given order. Usually it is not considered an error if a link target can be found multiple times in a given set of libraries. Linking may be done when an executable file is created (static linking), or whenever the program is used at runtime (dynamic linking).

The references being resolved may be addresses for jumps and other routine calls. They may be in the main program, or in one module depending upon another. They are resolved into fixed or relocatable addresses (from a common base) by allocating runtime memory for the memory segments of each module referenced.

Some programming languages use a feature called smart linking whereby the linker is aware of or integrated with the compiler, such that the linker knows how external references are used, and code in a library that is never actually used, even though internally referenced, can be discarded from the compiled application. For example, a program that only uses integers for arithmetic, or does no arithmetic operations at all, can exclude floating-point library routines. This smart-linking feature can lead to smaller application file sizes and reduced memory usage.

Relocation

Some references in a program or library module are stored in a relative or symbolic form which cannot be resolved until all code and libraries are assigned final static addresses. Relocation is the process of adjusting these references, and is done either by the linker or the loader. In general, relocation cannot be done to individual libraries themselves because the addresses in memory may vary depending on the program using them and other libraries they are combined with. Position-independent code avoids references to absolute addresses and therefore does not require relocation.

Static libraries

When linking is performed during the creation of an executable or another object file, it is known as static linking or early binding. In this case, the linking is usually done by a linker, but may also be done by the compiler. [14] A static library, also known as an archive, is one intended to be statically linked. Originally, only static libraries existed. Static linking must be performed when any modules are recompiled.

All of the modules required by a program are sometimes statically linked and copied into the executable file. This process, and the resulting stand-alone file, is known as a static build of the program. A static build may not need any further relocation if virtual memory is used and no address space layout randomization is desired. [15]

Shared libraries

A shared library or shared object is a file that is intended to be shared by executable files and further shared object files. Modules used by a program are loaded from individual shared objects into memory at load time or runtime, rather than being copied by a linker when it creates a single monolithic executable file for the program.

Shared libraries can be statically linked during compile-time, meaning that references to the library modules are resolved and the modules are allocated memory when the executable file is created.[ citation needed ] But often linking of shared libraries is postponed until they are loaded.[ dubious discuss ]

Object libraries

Although originally pioneered in the 1960s, dynamic linking did not reach the most commonly-used operating systems until the late 1980s. It was generally available in some form in most operating systems by the early 1990s. During this same period, object-oriented programming (OOP) was becoming a significant part of the programming landscape. OOP with runtime binding requires additional information that traditional libraries do not supply. In addition to the names and entry points of the code located within, they also require a list of the objects they depend on. This is a side-effect of one of OOP's core concepts, inheritance, which means that parts of the complete definition of any method may be in different places. This is more than simply listing that one library requires the services of another: in a true OOP system, the libraries themselves may not be known at compile time, and vary from system to system.

At the same time many developers worked on the idea of multi-tier programs, in which a "display" running on a desktop computer would use the services of a mainframe or minicomputer for data storage or processing. For instance, a program on a GUI-based computer would send messages to a minicomputer to return small samples of a huge dataset for display. Remote procedure calls (RPC) already handled these tasks, but there was no standard RPC system.

Soon the majority of the minicomputer and mainframe vendors instigated projects to combine the two, producing an OOP library format that could be used anywhere. Such systems were known as object libraries, or distributed objects, if they supported remote access (not all did). Microsoft's COM is an example of such a system for local use. DCOM, a modified version of COM, supports remote access.

For some time object libraries held the status of the "next big thing" in the programming world. There were a number of efforts to create systems that would run across platforms, and companies competed to try to get developers locked into their own system. Examples include IBM's System Object Model (SOM/DSOM), Sun Microsystems' Distributed Objects Everywhere (DOE), NeXT's Portable Distributed Objects (PDO), Digital's ObjectBroker, Microsoft's Component Object Model (COM/DCOM), and any number of CORBA-based systems.

Class libraries

Class libraries are the rough OOP equivalent of older types of code libraries. They contain classes, which describe characteristics and define actions (methods) that involve objects. Class libraries are used to create instances, or objects with their characteristics set to specific values. In some OOP languages, like Java, the distinction is clear, with the classes often contained in library files (like Java's JAR file format) and the instantiated objects residing only in memory (although potentially able to be made persistent in separate files). In others, like Smalltalk, the class libraries are merely the starting point for a system image that includes the entire state of the environment, classes and all instantiated objects.

Today most class libraries are stored in a package repository (such as Maven Central for Java). Client code explicitly declare the dependencies to external libraries in build configuration files (such as a Maven Pom in Java).

Remote libraries

Another library technique uses completely separate executables (often in some lightweight form) and calls them using a remote procedure call (RPC) over a network to another computer. This maximizes operating system re-use: the code needed to support the library is the same code being used to provide application support and security for every other program. Additionally, such systems do not require the library to exist on the same machine, but can forward the requests over the network.

However, such an approach means that every library call requires a considerable amount of overhead. RPC calls are much more expensive than calling a shared library that has already been loaded on the same machine. This approach is commonly used in a distributed architecture that makes heavy use of such remote calls, notably client-server systems and application servers such as Enterprise JavaBeans.

Code generation libraries

Code generation libraries are high-level APIs that can generate or transform byte code for Java. They are used by aspect-oriented programming, some data access frameworks, and for testing to generate dynamic proxy objects. They also are used to intercept field access. [16]

File naming

Most modern Unix-like systems

The system stores libfoo.a and libfoo.so files in directories such as /lib, /usr/lib or /usr/local/lib. The filenames always start with lib, and end with a suffix of .a (archive, static library) or of .so (shared object, dynamically linked library). Some systems might have multiple names for a dynamically linked library. These names typically share the same prefix and have different suffixes indicating the version number. Most of the names are names for symbolic links to the latest version. For example, on some systems libfoo.so.2 would be the filename for the second major interface revision of the dynamically linked library libfoo. The .la files sometimes found in the library directories are libtool archives, not usable by the system as such.

macOS

The system inherits static library conventions from BSD, with the library stored in a .a file, and can use .so-style dynamically linked libraries (with the .dylib suffix instead). Most libraries in macOS, however, consist of "frameworks", placed inside special directories called "bundles" which wrap the library's required files and metadata. For example, a framework called MyFramework would be implemented in a bundle called MyFramework.framework, with MyFramework.framework/MyFramework being either the dynamically linked library file or being a symlink to the dynamically linked library file in MyFramework.framework/Versions/Current/MyFramework.

Microsoft Windows

Dynamic-link libraries usually have the suffix *.DLL, [17] although other file name extensions may identify specific-purpose dynamically linked libraries, e.g. *.OCX for OLE libraries. The interface revisions are either encoded in the file names, or abstracted away using COM-object interfaces. Depending on how they are compiled, *.LIB files can be either static libraries or representations of dynamically linkable libraries needed only during compilation, known as "import libraries". Unlike in the UNIX world, which uses different file extensions, when linking against .LIB file in Windows one must first know if it is a regular static library or an import library. In the latter case, a .DLL file must be present at runtime.

See also

Notes

  1. It was possible earlier between, e.g., Ada subprograms.

Related Research Articles

<span class="mw-page-title-main">Linker (computing)</span> Computer program which combines multiple object files into a single file

In computing, a linker or link editor is a computer system program that takes one or more object files and combines them into a single executable file, library file, or another "object" file.

OCaml is a general-purpose, high-level, multi-paradigm programming language which extends the Caml dialect of ML with object-oriented features. OCaml was created in 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy, Ascánder Suárez, and others.

In computing, DLL hell is a term for the complications that arise when one works with dynamic-link libraries (DLLs) used with Microsoft Windows operating systems, particularly legacy 16-bit editions, which all run in a single memory space.

<span class="mw-page-title-main">PureBasic</span>

PureBasic is a commercially distributed procedural computer programming language and integrated development environment based on BASIC and developed by Fantaisie Software for Windows, Linux, and macOS. An Amiga version is available, although it has been discontinued and some parts of it are released as open-source. The first public release of PureBasic for Windows was on 17 December 2000. It has been continually updated ever since.

A shared library or shared object is a computer file that contains executable code designed to be used by multiple computer programs or other libraries at runtime.

<span class="mw-page-title-main">Windows API</span> Microsofts core set of application programming interfaces on Windows

The Windows API, informally WinAPI, is the foundational application programming interface (API) that allows a computer program to access the features of the Microsoft Windows operating system in which the program is running. Programs access API functionality via dynamic-link library (DLL) technology.

In computing, just-in-time (JIT) compilation is compilation during execution of a program rather than before execution. This may consist of source code translation but is more commonly bytecode translation to machine code, which is then executed directly. A system implementing a JIT compiler typically continuously analyses the code being executed and identifies parts of the code where the speedup gained from compilation or recompilation would outweigh the overhead of compiling that code.

In computer systems a loader is the part of an operating system that is responsible for loading programs and libraries. It is one of the essential stages in the process of starting a program, as it places programs into memory and prepares them for execution. Loading a program involves either memory-mapping or copying the contents of the executable file containing the program instructions into memory, and then carrying out other required preparatory tasks to prepare the executable for running. Once loading is complete, the operating system starts the program by passing control to the loaded program code.

In computing, position-independent code (PIC) or position-independent executable (PIE) is a body of machine code that executes properly regardless of its memory address. PIC is commonly used for shared libraries, so that the same library code can be loaded at a location in each program's address space where it does not overlap with other memory in use by, for example, other shared libraries. PIC was also used on older computer systems that lacked an MMU, so that the operating system could keep applications away from each other even within the single address space of an MMU-less system.

In computing, late binding or dynamic linkage—though not an identical process to dynamically linking imported code libraries—is a computer programming mechanism in which the method being called upon an object, or the function being called with arguments, is looked up by name at runtime. In other words, a name is associated with a particular operation or object at runtime, rather than during compilation. The name dynamic binding is sometimes used, but is more commonly used to refer to dynamic scope.

In computer programming, an entry point is the place in a program where the execution of a program begins, and where the program has access to command line arguments.

In software engineering, inversion of control (IoC) is a design principle in which custom-written portions of a computer program receive the flow of control from an external source. The term "inversion" is historical: a software architecture with this design "inverts" control as compared to procedural programming. In procedural programming, a program's custom code calls reusable libraries to take care of generic tasks, but with inversion of control, it is the external source or framework that calls the custom code.

Managed Extensions for C++ or Managed C++ is a deprecated set of language extensions for C++, including grammatical and syntactic extensions, keywords and attributes, to bring the C++ syntax and language to the .NET Framework. These extensions were created by Microsoft to allow C++ code to be targeted to the Common Language Runtime (CLR) in the form of managed code, as well as continue to interoperate with native code.

A static build is a compiled version of a program which has been statically linked against libraries.

In computer science, a static library or statically linked library is a set of routines, external functions and variables which are resolved in a caller at compile-time and copied into a target application by a compiler, linker, or binder, producing an object file and a stand-alone executable. This executable and the process of compiling it are both known as a static build of the program. Historically, libraries could only be static. Static libraries are either merged with other static libraries and object files during building/linking to form a single executable or loaded at run-time into the address space of their corresponding executable at a static memory offset determined at compile-time/link-time.

In computer programming, the term hooking covers a range of techniques used to alter or augment the behaviour of an operating system, of applications, or of other software components by intercepting function calls or messages or events passed between software components. Code that handles such intercepted function calls, events or messages is called a hook.

In computing, a dynamic linker is the part of an operating system that loads and links the shared libraries needed by an executable when it is executed, by copying the content of libraries from persistent storage to RAM, filling jump tables and relocating pointers. The specific operating system and executable format determine how the dynamic linker functions and how it is implemented.

The Microsoft Windows operating system supports a form of shared libraries known as "dynamic-link libraries", which are code libraries that can be used by multiple processes while only one copy is loaded into memory. This article provides an overview of the core libraries that are included with every modern Windows installation, on top of which most Windows applications are built.

The Java Class Loader, part of the Java Runtime Environment, dynamically loads Java classes into the Java Virtual Machine. Usually classes are only loaded on demand. The virtual machine will only load the class files required for executing the program. The Java run time system does not need to know about files and file systems as this is delegated to the class loader.

Dynamic loading is a mechanism by which a computer program can, at run time, load a library into memory, retrieve the addresses of functions and variables contained in the library, execute those functions or access those variables, and unload the library from memory. It is one of the three mechanisms by which a computer program can use some other software within the program; the others are static linking and dynamic linking. Unlike static linking and dynamic linking, dynamic loading allows a computer program to start up in the absence of these libraries, to discover available libraries, and to potentially gain additional functionality.

References

  1. Deshpande, Prasad (2013). Metamorphic Detection Using Function Call Graph Analysis (Thesis). San Jose State University Library. doi: 10.31979/etd.t9xm-ahsc .
  2. "Static Libraries". TLDP. Archived from the original on 2013-07-03. Retrieved 2013-10-03.
  3. Babbage, H. P. (1888-09-12). "The Analytical Engine". Proceedings of the British Association. Bath.
  4. Goldstine, Herman H. (2008-12-31). The Computer from Pascal to von Neumann. Princeton: Princeton University Press. doi:10.1515/9781400820139. ISBN   978-1-4008-2013-9.
  5. Goldstine, Herman; von Neumann, John (1947). Planning and coding of problems for an electronic computing instrument (Report). Institute for Advanced Study. pp. 3, 21–22. OCLC   26239859. it will probably be very important to develop an extensive "library" of subroutines
  6. Wilkes, M. V. (1951). "The EDSAC Computer". 1951 International Workshop on Managing Requirements Knowledge. 1951 International Workshop on Managing Requirements Knowledge. IEEE. p. 79. doi:10.1109/afips.1951.13.
  7. Campbell-Kelly, Martin (September 2011). "In Praise of 'Wilkes, Wheeler, and Gill'". Communications of the ACM. 54 (9): 25–27. doi:10.1145/1995376.1995386. S2CID   20261972.
  8. Wilkes, Maurice; Wheeler, David; Gill, Stanley (1951). The Preparation of Programs for an Electronic Digital Computer. Addison-Wesley. pp. 45, 80–91, 100. OCLC   641145988.
  9. Wexelblat, Richard (1981). History of Programming Languages. ACM Monograph Series. New York, NY: Academic Press (A subsidiary of Harcourt Brace). p.  274. ISBN   0-12-745040-8.
  10. Wexelblat, op. cit., p. 258
  11. Wilson, Leslie B.; Clark, Robert G. (1988). Comparative Programming Languages. Wokingham, England: Addison-Wesley. p. 126. ISBN   0-201-18483-4.
  12. Wilson and Clark, op. cit., p. 52
  13. Wexelblat, op. cit., p. 716
  14. Kaminsky, Dan (2008). "Chapter 3 - Portable Executable and Executable and Linking Formats". Reverse Engineering Code with IDA Pro. Elsevier. pp. 37–66. doi:10.1016/b978-1-59749-237-9.00003-x. ISBN   978-1-59749-237-9 . Retrieved 2021-05-27.
  15. Collberg, Christian; Hartman, John H.; Babu, Sridivya; Udupa, Sharath K. (2003). SLINKY: Static Linking Reloaded. USENIX '05. Department of Computer Science, University of Arizona. Archived from the original on 2016-03-23. Retrieved 2016-03-17.
  16. "Code Generation Library". Source Forge. Archived from the original on 2010-01-12. Retrieved 2010-03-03. Byte Code Generation Library is high level API to generate and transform JAVA byte code. It is used by AOP, testing, data access frameworks to generate dynamic proxy objects and intercept field access.
  17. Bresnahan, Christine; Blum, Richard (2015-04-27). LPIC-1 Linux Professional Institute Certification Study Guide: Exam 101-400 and Exam 102-400. John Wiley & Sons (published 2015). p. 82. ISBN   9781119021186. Archived from the original on 2015-09-24. Retrieved 2015-09-03. Linux shared libraries are similar to the dynamic link libraries (DLLs) of Windows. Windows DLLs are usually identified by .dll filename extensions.

Further reading