Interning (computer science)

Last updated

In computer science, interning is re-using objects of equal value on-demand instead of creating new objects. This creational pattern [1] is frequently used for numbers and strings in different programming languages. In many object-oriented languages such as Python, even primitive types such as integer numbers are objects. To avoid the overhead of constructing a large number of integer objects, these objects get reused through interning. [2]

Contents

For interning to work the interned objects must be immutable, since state is shared between multiple variables. String interning is a common application of interning, where many strings with identical values are needed in the same program.

History

Lisp introduced the notion of interned strings for its symbols. The LISP 1.5 Programmers Manual [3] describes a function called intern which either evaluates to an existing symbol of the supplied name, or if none exists, creates a new symbol of that name. This idea of interned symbols persists in more recent dialects of Lisp, such as Clojure in special forms such a (def symbol) which perform symbol creation and interning. [4]

In the object-oriented programming paradigm interning is an important mechanism in the flyweight pattern, where an interning method is called to store the intrinsic state of an object such that this can be shared among different objects which share different extrinsic state, avoiding needless duplication. [5]

Interning continues to be an important technique for managing memory use in programming language implementations; for example, the Java Language Specification requires that identical string literals (that is, literals that contain the same sequence of code points) must refer to the same instance of class String, because string literals are "interned" so as to share unique instances. [6] In the Python programming language small integers are interned, [7] though the details of exactly which are dependent on language version.

Motivation

Interning saves memory and can thus improve performance and memory footprint of a program. [8] The downside is time required to search for existing values of objects which are to be interned.

See also

Related Research Articles

In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that map values to other values, rather than a sequence of imperative statements which update the running state of the program.

<span class="mw-page-title-main">Lisp (programming language)</span> Programming language family

Lisp is a family of programming languages with a long history and a distinctive, fully parenthesized prefix notation. Originally specified in the late 1950s, it is the second-oldest high-level programming language still in common use, after Fortran. Lisp has changed since its early days, and many dialects have existed over its history. Today, the best-known general-purpose Lisp dialects are Common Lisp, Scheme, Racket, and Clojure.

<span class="mw-page-title-main">Smalltalk</span> Object-oriented programming language released first in 1972

Smalltalk is a purely object oriented programming language (OOP) that was originally created in the 1970s for educational use, specifically for constructionist learning, but later found use in business. It was created at Xerox PARC by Learning Research Group (LRG) scientists, including Alan Kay, Dan Ingalls, Adele Goldberg, Ted Kaehler, Diana Merry, and Scott Wallace.

In software engineering and computer science, abstraction is the process of generalizing concrete details, such as attributes, away from the study of objects and systems to focus attention on details of greater importance. Abstraction is a fundamental concept in computer science and software engineering, especially within the object-oriented programming paradigm. Examples of this include:

In object-oriented (OO) and functional programming, an immutable object is an object whose state cannot be modified after it is created. This is in contrast to a mutable object, which can be modified after it is created. In some cases, an object is considered immutable even if some internally used attributes change, but the object's state appears unchanging from an external point of view. For example, an object that uses memoization to cache the results of expensive computations could still be considered an immutable object.

Programming languages can be grouped by the number and types of paradigms supported.

In computer science, primitive data types are a set of basic data types from which all other data types are constructed. Specifically it often refers to the limited set of data representations in use by a particular processor, which all compiled programs must use. Most processors support a similar set of primitive data types, although the specific representations vary. More generally, "primitive data types" may refer to the standard data types built into a programming language. Data types which are not primitive are referred to as derived or composite.

In computer programming, an assignment statement sets and/or re-sets the value stored in the storage location(s) denoted by a variable name; in other words, it copies a value into the variable. In most imperative programming languages, the assignment statement is a fundamental construct.

printf is a C standard library function that formats text and writes it to standard output.

<span class="mw-page-title-main">Boolean data type</span> Data having only values "true" or "false"

In computer science, the Boolean is a data type that has one of two possible values which is intended to represent the two truth values of logic and Boolean algebra. It is named after George Boole, who first defined an algebraic system of logic in the mid 19th century. The Boolean data type is primarily associated with conditional statements, which allow different actions by changing control flow depending on whether a programmer-specified Boolean condition evaluates to true or false. It is a special case of a more general logical data type—logic does not always need to be Boolean.

In computer science, string interning is a method of storing only one copy of each distinct string value, which must be immutable. Interning strings makes some string processing tasks more time-efficient or space-efficient at the cost of requiring more time when the string is created or interned. The distinct values are stored in a string intern pool.

In computer programming, a sigil is a symbol affixed to a variable name, showing the variable's datatype or scope, usually a prefix, as in $foo, where $ is the sigil.

In computer science, a literal is a textual representation (notation) of a value as it is written in source code. Almost all programming languages have notations for atomic values such as integers, floating-point numbers, and strings, and usually for Booleans and characters; some also have notations for elements of enumerated types and compound values such as arrays, records, and objects. An anonymous function is a literal for the function type.

<span class="mw-page-title-main">Syntax (programming languages)</span> Set of rules defining correctly structured programs

In computer science, the syntax of a computer language is the rules that define the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.

In programming, a docstring is a string literal specified in source code that is used, like a comment, to document a specific segment of code. Unlike conventional source code comments, or even specifically formatted comments like docblocks, docstrings are not stripped from the source tree when it is parsed and are retained throughout the runtime of the program. This allows the programmer to inspect these comments at run time, for instance as an interactive help system, or as metadata.

A symbol in computer programming is a primitive data type whose instances have a human-readable form. Symbols can be used as identifiers. In some programming languages, they are called atoms. Uniqueness is enforced by holding them in a symbol table. The most common use of symbols by programmers is to perform language reflection, and the most common indirectly is their use to create object linkages.

In computer programming, a variable is an abstract storage location paired with an associated symbolic name, which contains some known or unknown quantity of data or object referred to as a value; or in simpler terms, a variable is a named container for a particular set of bits or type of data. A variable can eventually be associated with or identified by a memory address. The variable name is the usual way to reference the stored value, in addition to referring to the variable itself, depending on the context. This separation of name and content allows the name to be used independently of the exact information it represents. The identifier in computer source code can be bound to a value during run time, and the value of the variable may thus change during the course of program execution.

In computer programming, a constant is a value that is not altered by the program during normal execution. When associated with an identifier, a constant is said to be "named," although the terms "constant" and "named constant" are often used interchangeably. This is contrasted with a variable, which is an identifier with a value that can be changed during normal execution. To simplify, constants' values remains, while the values of variables varies, hence both their names.

<span class="mw-page-title-main">Object-oriented programming</span> Programming paradigm based on the concept of objects

Object-oriented programming (OOP) is a programming paradigm based on the concept of objects, which can contain data and code: data in the form of fields, and code in the form of procedures. In OOP, computer programs are designed by making them out of objects that interact with one another.

References

  1. "Design Patterns" (PDF). University of Washington.
  2. Jaworski, Michał (2019). Expert Python programming : become a master in Python by learning coding best practices and advanced programming concepts in Python 3.7. Tarek Ziadé (Third ed.). Birmingham, U.K. ISBN   978-1-78980-677-9. OCLC   1125343555.{{cite book}}: CS1 maint: location missing publisher (link)
  3. Levin, Michael I. (1965). LISP 1.5 programmer's manual : the Computation Center and Research Laboratory of Electronics, Massachusetts Institute of Technology. John McCarthy, Massachusetts Institute of Technology. Computation Center, Massachusetts Institute of Technology. Research Laboratory of Electronics (2nd ed.). Cambridge: M.I.T. Press. ISBN   0-262-13011-4. OCLC   1841373.
  4. "Clojure - Special Forms". clojure.org. Retrieved 2021-11-04.
  5. Design patterns : elements of reusable object-oriented software. Erich Gamma, Richard Helm, Ralph E. Johnson, John Vlissides. Reading, Mass.: Addison-Wesley. 1995. ISBN   0-201-63361-2. OCLC   31171684.{{cite book}}: CS1 maint: others (link)
  6. "Java Language Specification. Chapter 3. Lexical Structure". docs.oracle.com. Retrieved 2021-11-04.
  7. "PEP 237 -- Unifying Long Integers and Integers". Python.org. Retrieved 2021-11-04.
  8. Oaks, Scott (2014). Java performance : the definitive guide. Sebastopol, CA: O'Reilly Media. ISBN   978-1-4493-6354-3. OCLC   878059649.