In computer science, garbage collection (GC) is a form of automatic memory management. [2] The garbage collector attempts to reclaim memory that was allocated by the program, but is no longer referenced; such memory is called garbage . Garbage collection was invented by American computer scientist John McCarthy around 1959 to simplify manual memory management in Lisp. [3]
Garbage collection relieves the programmer from doing manual memory management, where the programmer specifies what objects to de-allocate and return to the memory system and when to do so. [4] Other, similar techniques include stack allocation, region inference, and memory ownership, and combinations thereof. Garbage collection may take a significant proportion of a program's total processing time, and affect performance as a result.
Resources other than memory, such as network sockets, database handles, windows, file descriptors, and device descriptors, are not typically handled by garbage collection, but rather by other methods (e.g. destructors). Some such methods de-allocate memory also.
This section needs additional citations for verification .(July 2014) |
Many programming languages require garbage collection, either as part of the language specification (e.g., RPL, Java, C#, D, [5] Go, and most scripting languages) or effectively for practical implementation (e.g., formal languages like lambda calculus). [6] These are said to be garbage-collected languages. Other languages, such as C and C++, were designed for use with manual memory management, but have garbage-collected implementations available. Some languages, like Ada, Modula-3, and C++/CLI, allow both garbage collection and manual memory management to co-exist in the same application by using separate heaps for collected and manually managed objects. Still others, like D, are garbage-collected but allow the user to manually delete objects or even disable garbage collection entirely when speed is required. [7]
Although many languages integrate GC into their compiler and runtime system, post-hoc GC systems also exist, such as Automatic Reference Counting (ARC). Some of these post-hoc GC systems do not require recompilation. [8]
GC frees the programmer from manually de-allocating memory. This helps avoid some kinds of errors: [9]
GC uses computing resources to decide which memory to free. Therefore, the penalty for the convenience of not annotating object lifetime manually in the source code is overhead, which can impair program performance. [12] A peer-reviewed paper from 2005 concluded that GC needs five times the memory to compensate for this overhead and to perform as fast as the same program using idealized explicit memory management. The comparison however is made to a program generated by inserting deallocation calls using an oracle, implemented by collecting traces from programs run under a profiler, and the program is only correct for one particular execution of the program. [13] Interaction with memory hierarchy effects can make this overhead intolerable in circumstances that are hard to predict or to detect in routine testing. The impact on performance was given by Apple as a reason for not adopting garbage collection in iOS, despite it being the most desired feature. [14]
The moment when the garbage is actually collected can be unpredictable, resulting in stalls (pauses to shift/free memory) scattered throughout a session. Unpredictable stalls can be unacceptable in real-time environments, in transaction processing, or in interactive programs. Incremental, concurrent, and real-time garbage collectors address these problems, with varying trade-offs.
Tracing garbage collection is the most common type of garbage collection, so much so that "garbage collection" often refers to tracing garbage collection, rather than other methods such as reference counting. The overall strategy consists of determining which objects should be garbage collected by tracing which objects are reachable by a chain of references from certain root objects, and considering the rest as garbage and collecting them. However, there are a large number of algorithms used in implementation, with widely varying complexity and performance characteristics.
Reference counting garbage collection is where each object has a count of the number of references to it. Garbage is identified by having a reference count of zero. An object's reference count is incremented when a reference to it is created and decremented when a reference is destroyed. When the count reaches zero, the object's memory is reclaimed. [15]
As with manual memory management, and unlike tracing garbage collection, reference counting guarantees that objects are destroyed as soon as their last reference is destroyed, and usually only accesses memory which is either in CPU caches, in objects to be freed, or directly pointed to by those, and thus tends to not have significant negative side effects on CPU cache and virtual memory operation.
There are a number of disadvantages to reference counting; this can generally be solved or mitigated by more sophisticated algorithms:
const
references. Reference counting in C++ is usually implemented using "smart pointers" [19] whose constructors, destructors, and assignment operators manage the references. A smart pointer can be passed by reference to a function, which avoids the need to copy-construct a new smart pointer (which would increase the reference count on entry into the function and decrease it on exit). Instead, the function receives a reference to the smart pointer which is produced inexpensively. The Deutsch-Bobrow method of reference counting capitalizes on the fact that most reference count updates are in fact generated by references stored in local variables. It ignores these references, only counting references in the heap, but before an object with reference count zero can be deleted, the system must verify with a scan of the stack and register that no other reference to it still exists. A further substantial decrease in the overhead on counter updates can be obtained by update coalescing introduced by Levanoni and Petrank. [20] [21] Consider a pointer that in a given interval of the execution is updated several times. It first points to an object O1
, then to an object O2
, and so forth until at the end of the interval it points to some object On
. A reference counting algorithm would typically execute rc(O1)--
, rc(O2)++
, rc(O2)--
, rc(O3)++
, rc(O3)--
, ..., rc(On)++
. But most of these updates are redundant. In order to have the reference count properly evaluated at the end of the interval it is enough to perform rc(O1)--
and rc(On)++
. Levanoni and Petrank measured an elimination of more than 99% of the counter updates in typical Java benchmarks.Escape analysis is a compile-time technique that can convert heap allocations to stack allocations, thereby reducing the amount of garbage collection to be done. This analysis determines whether an object allocated inside a function is accessible outside of it. If a function-local allocation is found to be accessible to another function or thread, the allocation is said to "escape" and cannot be done on the stack. Otherwise, the object may be allocated directly on the stack and released when the function returns, bypassing the heap and associated memory management costs. [22]
Generally speaking, higher-level programming languages are more likely to have garbage collection as a standard feature. In some languages lacking built-in garbage collection, it can be added through a library, as with the Boehm garbage collector for C and C++.
Most functional programming languages, such as ML, Haskell, and APL, have garbage collection built in. Lisp is especially notable as both the first functional programming language and the first language to introduce garbage collection. [23]
Other dynamic languages, such as Ruby and Julia (but not Perl 5 or PHP before version 5.3, [24] which both use reference counting), JavaScript and ECMAScript also tend to use GC. Object-oriented programming languages such as Smalltalk, ooRexx, RPL and Java usually provide integrated garbage collection. Notable exceptions are C++ and Delphi, which have destructors.
BASIC and Logo have often used garbage collection for variable-length data types, such as strings and lists, so as not to burden programmers with memory management details. On the Altair 8800, programs with many string variables and little string space could cause long pauses due to garbage collection. [25] Similarly the Applesoft BASIC interpreter's garbage collection algorithm repeatedly scans the string descriptors for the string having the highest address in order to compact it toward high memory, resulting in performance [26] and pauses anywhere from a few seconds to a few minutes. [27] A replacement garbage collector for Applesoft BASIC by Randy Wigginton identifies a group of strings in every pass over the heap, reducing collection time dramatically. [28] BASIC.SYSTEM, released with ProDOS in 1983, provides a windowing garbage collector for BASIC that is many times faster. [29]
While the Objective-C traditionally had no garbage collection, with the release of OS X 10.5 in 2007 Apple introduced garbage collection for Objective-C 2.0, using an in-house developed runtime collector. [30] However, with the 2012 release of OS X 10.8, garbage collection was deprecated in favor of LLVM's automatic reference counter (ARC) that was introduced with OS X 10.7. [31] Furthermore, since May 2015 Apple even forbade the usage of garbage collection for new OS X applications in the App Store. [32] [33] For iOS, garbage collection has never been introduced due to problems in application responsivity and performance; [14] [34] instead, iOS uses ARC. [35] [36]
Garbage collection is rarely used on embedded or real-time systems because of the usual need for very tight control over the use of limited resources. However, garbage collectors compatible with many limited environments have been developed. [37] The Microsoft .NET Micro Framework, .NET nanoFramework [38] and Java Platform, Micro Edition are embedded software platforms that, like their larger cousins, include garbage collection.
Garbage collectors available in Java JDKs include:
Compile-time garbage collection is a form of static analysis allowing memory to be reused and reclaimed based on invariants known during compilation.
This form of garbage collection has been studied in the Mercury programming language, [40] and it saw greater usage with the introduction of LLVM's automatic reference counter (ARC) into Apple's ecosystem (iOS and OS X) in 2011. [35] [36] [32]
Incremental, concurrent, and real-time garbage collectors have been developed, for example by Henry Baker and by Henry Lieberman. [41] [42] [43]
In Baker's algorithm, the allocation is done in either half of a single region of memory. When it becomes half full, a garbage collection is performed which moves the live objects into the other half and the remaining objects are implicitly deallocated. The running program (the 'mutator') has to check that any object it references is in the correct half, and if not move it across, while a background task is finding all of the objects. [44]
Generational garbage collection schemes are based on the empirical observation that most objects die young. In generational garbage collection, two or more allocation regions (generations) are kept, which are kept separate based on the object's age. New objects are created in the "young" generation that is regularly collected, and when a generation is full, the objects that are still referenced from older regions are copied into the next oldest generation. Occasionally a full scan is performed.
Some high-level language computer architectures include hardware support for real-time garbage collection.
Most implementations of real-time garbage collectors use tracing.[ citation needed ] Such real-time garbage collectors meet hard real-time constraints when used with a real-time operating system. [45]
In computer science, a memory leak is a type of resource leak that occurs when a computer program incorrectly manages memory allocations in a way that memory which is no longer needed is not released. A memory leak may also happen when an object is stored in memory but cannot be accessed by the running code. A memory leak has symptoms similar to a number of other problems and generally can only be diagnosed by a programmer with access to the program's source code.
In computer science, reference counting is a programming technique of storing the number of references, pointers, or handles to a resource, such as an object, a block of memory, disk space, and others.
Java and C++ are two prominent object-oriented programming languages. By many language popularity metrics, the two languages have dominated object-oriented and high-performance software development for much of the 21st century, and are often directly compared and contrasted. Java's syntax was based on C/C++.
Cocoa is Apple's native object-oriented application programming interface (API) for its desktop operating system macOS.
Memory management is a form of resource management applied to computer memory. The essential requirement of memory management is to provide ways to dynamically allocate portions of memory to programs at their request, and free it for reuse when no longer needed. This is critical to any advanced computer system where more than a single process might be underway at any time.
In computer programming, a weak reference is a reference that does not protect the referenced object from collection by a garbage collector, unlike a strong reference. An object referenced only by weak references – meaning "every chain of references that reaches the object includes at least one weak reference as a link" – is considered weakly reachable, and can be treated as unreachable and so may be collected at any time. Some garbage-collected languages feature or support various levels of weak references, such as C#, Lua, Java, Lisp, OCaml, Perl, Python and PHP since the version 7.4.
In object-oriented programming (OOP), object lifetime is the period of time between an object's creation and its destruction. In some programming contexts, object lifetime coincides with the lifetime of a variable that represents the object. In other contexts – where the object is accessed by reference – object lifetime is not determined by the lifetime of a variable. For example, destruction of the variable may only destroy the reference; not the referenced object.
In computer programming, tracing garbage collection is a form of automatic memory management that consists of determining which objects should be deallocated by tracing which objects are reachable by a chain of references from certain "root" objects, and considering the rest as "garbage" and collecting them. Tracing is the most common type of garbage collection – so much so that "garbage collection" often refers to the tracing method, rather than others such as reference counting – and there are a large number of algorithms used in implementation.
Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations. More generally, dangling references and wild references are references that do not resolve to a valid destination.
In computer science, a finalizer or finalize method is a special method that performs finalization, generally some form of cleanup. A finalizer is executed during object destruction, prior to the object being deallocated, and is complementary to an initializer, which is executed during object creation, following allocation. Finalizers are strongly discouraged by some, due to difficulty in proper use and the complexity they add, and alternatives are suggested instead, mainly the dispose pattern.
The Boehm–Demers–Weiser garbage collector, often simply known as the Boehm GC or Boehm collector, is a conservative garbage collector for C and C++ developed by Hans Boehm, Alan Demers, and Mark Weiser.
In computer programming, unreachable memory is a block of dynamically allocated memory where the program that allocated the memory no longer has any reachable pointer that refers to it. Similarly, an unreachable object is a dynamically allocated object that has no reachable reference to it. Informally, unreachable memory is dynamic memory that the program cannot reach directly, nor get to by starting at an object it can reach directly, and then following a chain of pointer references.
In computer science, garbage includes data, objects, or other regions of the memory of a computer system, which will not be used in any future computation by the system, or by a program running on it. Because every computer system has a finite amount of memory, and most software produces garbage, it is frequently necessary to deallocate memory that is occupied by garbage and return it to the heap, or memory pool, for reuse.
In computer science, manual memory management refers to the usage of manual instructions by the programmer to identify and deallocate unused objects, or garbage. Up until the mid-1990s, the majority of programming languages used in industry supported manual memory management, though garbage collection has existed since 1959, when it was introduced with Lisp. Today, however, languages with garbage collection such as Java are increasingly popular and the languages Objective-C and Swift provide similar functionality through Automatic Reference Counting. The main manually managed languages still in widespread use today are C and C++ – see C dynamic memory allocation.
In compiler optimization, escape analysis is a method for determining the dynamic scope of pointers – where in the program a pointer can be accessed. It is related to pointer analysis and shape analysis.
In software development, the programming language Java was historically considered slower than the fastest third-generation typed languages such as C and C++. In contrast to those languages, Java compiles by default to a Java Virtual Machine (JVM) with operations distinct from those of the actual computer hardware. Early JVM implementations were interpreters; they simulated the virtual operations one-by-one rather than translating them into machine code for direct hardware execution.
In computer science, a mark–compact algorithm is a type of garbage collection algorithm used to reclaim unreachable memory. Mark–compact algorithms can be regarded as a combination of the mark–sweep algorithm and Cheney's copying algorithm. First, reachable objects are marked, then a compacting step relocates the reachable (marked) objects towards the beginning of the heap area. Compacting garbage collection is used by modern JVMs, Microsoft's Common Language Runtime and by the Glasgow Haskell Compiler.
Memory safety is the state of being protected from various software bugs and security vulnerabilities when dealing with memory access, such as buffer overflows and dangling pointers. For example, Java is said to be memory-safe because its runtime error detection checks array bounds and pointer dereferences. In contrast, C and C++ allow arbitrary pointer arithmetic with pointers implemented as direct memory addresses with no provision for bounds checking, and thus are potentially memory-unsafe.
In computer science, region-based memory management is a type of memory management in which each allocated object is assigned to a region. A region, also called a zone, arena, area, or memory context, is a collection of allocated objects that can be efficiently reallocated or deallocated all at once. Memory allocators using region-based managements are often called area allocators, and when they work by only "bumping" a single pointer, as bump allocators.
Kathryn S. McKinley is an American computer scientist noted for her research on compilers, runtime systems, and computer architecture. She is also known for her leadership in broadening participation in computing. McKinley was co-chair of CRA-W from 2011 to 2014.
{{cite book}}
: |journal=
ignored (help){{cite book}}
: |journal=
ignored (help)