Tracing garbage collection

Last updated

In computer programming, tracing garbage collection is a form of automatic memory management that consists of determining which objects should be deallocated ("garbage collected") by tracing which objects are reachable by a chain of references from certain "root" objects, and considering the rest as "garbage" and collecting them. Tracing garbage collection is the most common type of garbage collection – so much so that "garbage collection" often refers to tracing garbage collection, rather than other methods such as reference counting – and there are a large number of algorithms used in implementation.

Contents

Reachability of an object

Informally, an object is reachable if it is referenced by at least one variable in the program, either directly or through references from other reachable objects. More precisely, objects can be reachable in only two ways:

  1. A distinguished set of roots: objects that are assumed to be reachable. Typically, these include all the objects referenced from anywhere in the call stack (that is, all local variables and parameters in the functions currently being invoked), and any global variables.
  2. Anything referenced from a reachable object is itself reachable; more formally, reachability is a transitive closure.

The reachability definition of "garbage" is not optimal, insofar as the last time a program uses an object could be long before that object falls out of the environment scope. A distinction is sometimes drawn between syntactic garbage, those objects the program cannot possibly reach, and semantic garbage, those objects the program will in fact never again use. For example:

Objectx=newFoo();Objecty=newBar();x=newQuux();/* At this point, we know that the Foo object  * originally assigned to x will never be * accessed: it is syntactic garbage. *//* In the following block, y *could* be semantic garbage; * but we won't know until x.check_something() returns * some value -- if it returns at all. */if(x.check_something()){x.do_something(y);}System.exit(0);

The problem of precisely identifying semantic garbage can easily be shown to be partially decidable: a program that allocates an object X, runs an arbitrary input program P, and uses X if and only if P finishes would require a semantic garbage collector to solve the halting problem. Although conservative heuristic methods for semantic garbage detection remain an active research area, essentially all practical garbage collectors focus on syntactic garbage.[ citation needed ]

Another complication with this approach is that, in languages with both reference types and unboxed value types, the garbage collector needs to somehow be able to distinguish which variables on the stack or fields in an object are regular values and which are references: in memory, an integer and a reference might look alike. The garbage collector then needs to know whether to treat the element as a reference and follow it, or whether it is a primitive value. One common solution is the use of tagged pointers.

Strong and weak references

The garbage collector can reclaim only objects that have no references pointing to them either directly or indirectly from the root set. However, some programs require weak references, which should be usable for as long as the object exists but should not prolong its lifetime. In discussions about weak references, ordinary references are sometimes called strong references. An object is eligible for garbage collection if there are no strong (i.e. ordinary) references to it, even though there still might be some weak references to it.

A weak reference is not merely just any pointer to the object that a garbage collector does not care about. The term is usually reserved for a properly managed category of special reference objects which are safe to use even after the object disappears because they lapse to a safe value (usually null). An unsafe reference that is not known to the garbage collector will simply remain dangling by continuing to refer to the address where the object previously resided. This is not a weak reference.

In some implementations, weak references are divided into subcategories. For example, the Java Virtual Machine provides three forms of weak references, namely soft references, [1] phantom references, [2] and regular weak references. [3] A softly referenced object is only eligible for reclamation, if the garbage collector decides that the program is low on memory. Unlike a soft reference or a regular weak reference, a phantom reference does not provide access to the object that it references. Instead, a phantom reference is a mechanism that allows the garbage collector to notify the program when the referenced object has become phantom reachable. An object is phantom reachable, if it still resides in memory and it is referenced by a phantom reference, but its finalizer has already executed. Similarly, Microsoft.NET provides two subcategories of weak references, [4] namely long weak references (tracks resurrection) and short weak references.

Weak collections

Data structures can also be devised which have weak tracking features. For instance, weak hash tables are useful. Like a regular hash table, a weak hash table maintains an association between pairs of objects, where each pair is understood to be a key and value. However, the hash table does not actually maintain a strong reference on these objects. Special behavior takes place when either the key or value or both become garbage: the hash table entry is spontaneously deleted. There exist further refinements such as hash tables which have only weak keys (value references are ordinary, strong references) or only weak values (key references are strong).

Weak hash tables are important for maintaining associations between objects, such that the objects engaged in the association can still become garbage if nothing in the program refers to them any longer (other than the associating hash table).

The use of a regular hash table for such a purpose could lead to a "logical memory leak": the accumulation of reachable data which the program does not need and will not use.

Basic algorithm

Tracing collectors are so called because they trace through the working set of memory. These garbage collectors perform collection in cycles. It is common for cycles to be triggered when there is not enough free memory for the memory manager to satisfy an allocation request. But cycles can often be requested by the mutator directly or run on a time schedule. The original method involves a naïve mark-and-sweep in which the entire memory set is touched several times.

Naïve mark-and-sweep

Naive mark-and-sweep in action on a heap containing eight objects. Arrows represent object references. Circles represent the objects themselves. Objects #1, #2, #3, #4, and #6 are strongly referenced from the root set. On the other hand, objects #5, #7, and #8 are not strongly referenced either directly or indirectly from the root set; therefore, they are garbage. Animation of the Naive Mark and Sweep Garbage Collector Algorithm.gif
Naive mark-and-sweep in action on a heap containing eight objects. Arrows represent object references. Circles represent the objects themselves. Objects #1, #2, #3, #4, and #6 are strongly referenced from the root set. On the other hand, objects #5, #7, and #8 are not strongly referenced either directly or indirectly from the root set; therefore, they are garbage.

In the naive mark-and-sweep method, each object in memory has a flag (typically a single bit) reserved for garbage collection use only. This flag is always cleared, except during the collection cycle.

The first stage is the mark stage which does a tree traversal of the entire 'root set' and marks each object that is pointed to by a root as being 'in-use'. All objects that those objects point to, and so on, are marked as well, so that every object that is reachable via the root set is marked.

In the second stage, the sweep stage, all memory is scanned from start to finish, examining all free or used blocks; those not marked as being 'in-use' are not reachable by any roots, and their memory is freed. For objects which were marked in-use, the in-use flag is cleared, preparing for the next cycle.

This method has several disadvantages, the most notable being that the entire system must be suspended during collection; no mutation of the working set can be allowed. This can cause programs to 'freeze' periodically (and generally unpredictably), making some real-time and time-critical applications impossible. In addition, the entire working memory must be examined, much of it twice, potentially causing problems in paged memory systems.

Tri-color marking

An example of tri-color marking on a heap with 8 objects. White, grey, and black objects are represented by light-grey, yellow, and blue, respectively. Animation of tri-color garbage collection.gif
An example of tri-color marking on a heap with 8 objects. White, grey, and black objects are represented by light-grey, yellow, and blue, respectively.

Because of these performance problems, most modern tracing garbage collectors implement some variant of the tri-color marking abstraction, but simple collectors (such as the mark-and-sweep collector) often do not make this abstraction explicit. Tri-color marking works as described below.

Three sets are created white, black and grey:

In many algorithms, initially the black set starts as empty, the grey set is the set of objects which are directly referenced from roots and the white set includes all other objects. Every object in memory is at all times in exactly one of the three sets. The algorithm proceeds as following:

  1. Pick an object o from the grey set
  2. Move each white object o references to the grey set. This ensures that neither this object nor any object it references can be garbage-collected.
  3. Move o to the black set
  4. Repeat the last three steps until the grey set is empty.

When the grey set is empty, the scan is complete; the black objects are reachable from the roots, while the white objects are not and can be garbage-collected.

Since all objects not immediately reachable from the roots are added to the white set, and objects can only move from white to grey and from grey to black, the algorithm preserves an important invariant  no black objects reference white objects. This ensures that the white objects can be freed once the grey set is empty. This is called the tri-color invariant. Some variations on the algorithm do not preserve this invariant but use a modified form for which all the important properties hold.

The tri-color method has an important advantage  it can be performed "on-the-fly", without halting the system for significant periods of time. This is accomplished by marking objects as they are allocated and during mutation, maintaining the various sets. By monitoring the size of the sets, the system can perform garbage collection periodically, rather than as needed. Also, the need to touch the entire working set on each cycle is avoided.

Implementation strategies

Moving vs. non-moving

Once the unreachable set has been determined, the garbage collector may simply release the unreachable objects and leave everything else as it is, or it may copy some or all of the reachable objects into a new area of memory, updating all references to those objects as needed. These are called "non-moving" and "moving" (or, alternatively, "non-compacting" and "compacting") garbage collectors, respectively.

At first, a moving algorithm may seem inefficient compared to a non-moving one, since much more work would appear to be required on each cycle. But the moving algorithm leads to several performance advantages, both during the garbage collection cycle itself and during program execution:

One disadvantage of a moving garbage collector is that it only allows access through references that are managed by the garbage collected environment, and does not allow pointer arithmetic. This is because any pointers to objects will be invalidated if the garbage collector moves those objects (they become dangling pointers). For interoperability with native code, the garbage collector must copy the object contents to a location outside of the garbage collected region of memory. An alternative approach is to pin the object in memory, preventing the garbage collector from moving it and allowing the memory to be directly shared with native pointers (and possibly allowing pointer arithmetic). [5]

Copying vs. mark-and-sweep vs. mark-and-don't-sweep

Not only do collectors differ in whether they are moving or non-moving, they can also be categorized by how they treat the white, grey and black object sets during a collection cycle.

The most straightforward approach is the semi-space collector, which dates to 1969. In this moving collector, memory is partitioned into an equally sized "from space" and "to space". Initially, objects are allocated in "to space" until it becomes full and a collection cycle is triggered. At the start of the cycle, the "to space" becomes the "from space", and vice versa. The objects reachable from the root set are copied from the "from space" to the "to space". These objects are scanned in turn, and all objects that they point to are copied into "to space", until all reachable objects have been copied into "to space". Once the program continues execution, new objects are once again allocated in the "to space" until it is once again full and the process is repeated.

This approach is very simple, but since only one semi space is used for allocating objects, the memory usage is twice as high compared to other algorithms. The technique is also known as stop-and-copy. Cheney's algorithm is an improvement on the semi-space collector.

A mark and sweep garbage collector keeps a bit or two with each object to record if it is white or black. The grey set is kept as a separate list or using another bit. As the reference tree is traversed during a collection cycle (the "mark" phase), these bits are manipulated by the collector. A final "sweep" of the memory areas then frees white objects. The mark and sweep strategy has the advantage that, once the condemned set is determined, either a moving or non-moving collection strategy can be pursued. This choice of strategy can be made at runtime, as available memory permits. It has the disadvantage of "bloating" objects by a small amount, as in, every object has a small hidden memory cost because of the list/extra bit. This can be somewhat mitigated if the collector also handles allocation, since then it could potentially use unused bits in the allocation data structures. Or, this "hidden memory" can be eliminated by using a Tagged pointer, trading the memory cost for CPU time. However, the mark and sweep is the only strategy that readily cooperates with external allocators in the first place.

A mark and don't sweep garbage collector, like the mark-and-sweep, keeps a bit with each object to record if it is white or black; the grey set is kept as a separate list or using another bit. There are two key differences here. First, black and white mean different things than they do in the mark and sweep collector. In a "mark and don't sweep" collector, all reachable objects are always black. An object is marked black at the time it is allocated, and it will stay black even if it becomes unreachable. A white object is unused memory and may be allocated. Second, the interpretation of the black/white bit can change. Initially, the black/white bit may have the sense of (0=white, 1=black). If an allocation operation ever fails to find any available (white) memory, that means all objects are marked used (black). The sense of the black/white bit is then inverted (for example, 0=black, 1=white). Everything becomes white. This momentarily breaks the invariant that reachable objects are black, but a full marking phase follows immediately, to mark them black again. Once this is done, all unreachable memory is white. No "sweep" phase is necessary.

The mark and don't sweep strategy requires cooperation between the allocator and collector, but is incredibly space efficient since it only requires one bit per allocated pointer (which most allocation algorithms require anyway). However, this upside is somewhat mitigated, since most of the time large portions of memory are wrongfully marked black (used), making it hard to give resources back to the system (for use by other allocators, threads, or processes) in times of low memory usage.

The mark and don't sweep strategy can therefore be seen as a compromise between the upsides and downsides of the mark and sweep and the stop and copy strategies.

Generational GC (ephemeral GC)

It has been empirically observed that in many programs, the most recently created objects are also those most likely to become unreachable quickly (known as infant mortality or the generational hypothesis). A generational GC (also known as ephemeral GC) divides objects into generations and, on most cycles, will place only the objects of a subset of generations into the initial white (condemned) set. Furthermore, the runtime system maintains knowledge of when references cross generations by observing the creation and overwriting of references. When the garbage collector runs, it may be able to use this knowledge to prove that some objects in the initial white set are unreachable without having to traverse the entire reference tree. If the generational hypothesis holds, this results in much faster collection cycles while still reclaiming most unreachable objects.

In order to implement this concept, many generational garbage collectors use separate memory regions for different ages of objects. When a region becomes full, the objects in it are traced, using the references from the older generation(s) as roots. This usually results in most objects in the generation being collected (by the hypothesis), leaving it to be used to allocate new objects. When a collection doesn't collect many objects (the hypothesis doesn't hold, for example because the program has computed a large collection of new objects it does want to retain), some or all of the surviving objects that are referenced from older memory regions are promoted to the next highest region, and the entire region can then be overwritten with fresh objects. This technique permits very fast incremental garbage collection, since the garbage collection of only one region at a time is all that is typically required.

Ungar's classic generation scavenger has two generations. It divides the youngest generation, called "new space", into a large "eden" in which new objects are created and two smaller "survivor spaces", past survivor space and future survivor space. The objects in the older generation that may reference objects in new space are kept in a "remembered set". On each scavenge, the objects in new space are traced from the roots in the remembered set and copied to future survivor space. If future survivor space fills up, the objects that do not fit are promoted to old space, a process called "tenuring". At the end of the scavenge, some objects reside in future survivor space, and eden and past survivor space are empty. Then future survivor space and past survivor space are exchanged and the program continues, allocating objects in eden. In Ungar's original system, eden is 5 times larger than each survivor space.

Generational garbage collection is a heuristic approach, and some unreachable objects may not be reclaimed on each cycle. It may therefore occasionally be necessary to perform a full mark and sweep or copying garbage collection to reclaim all available space. In fact, runtime systems for modern programming languages (such as Java and the .NET Framework) usually use some hybrid of the various strategies that have been described thus far; for example, most collection cycles might look only at a few generations, while occasionally a mark-and-sweep is performed, and even more rarely a full copying is performed to combat fragmentation. The terms "minor cycle" and "major cycle" are sometimes used to describe these different levels of collector aggression.

Stop-the-world vs. incremental vs. concurrent

Simple stop-the-world garbage collectors completely halt execution of the program to run a collection cycle, thus guaranteeing that new objects are not allocated and objects do not suddenly become unreachable while the collector is running.

This has the disadvantage that the program can perform no useful work while a collection cycle is running (sometimes called the "embarrassing pause" [6] ). Stop-the-world garbage collection is therefore mainly suitable for non-interactive programs. Its advantage is that it is both simpler to implement and faster than incremental garbage collection.

Incremental and concurrent garbage collectors are designed to reduce this disruption by interleaving their work with activity from the main program. Incremental garbage collectors perform the garbage collection cycle in discrete phases, with program execution permitted between each phase (and sometimes during some phases). Concurrent garbage collectors do not stop program execution at all, except perhaps briefly when the program's execution stack is scanned. However, the sum of the incremental phases takes longer to complete than one batch garbage collection pass, so these garbage collectors may yield lower total throughput.

Careful design is necessary with these techniques to ensure that the main program does not interfere with the garbage collector and vice versa; for example, when the program needs to allocate a new object, the runtime system may either need to suspend it until the collection cycle is complete, or somehow notify the garbage collector that there exists a new, reachable object.

Precise vs. conservative and internal pointers

Some collectors can correctly identify all pointers (references) in an object; these are called precise (also exact or accurate) collectors, the opposite being a conservative or partly conservative collector. Conservative collectors assume that any bit pattern in memory could be a pointer if, interpreted as a pointer, it would point into an allocated object. Conservative collectors may produce false positives, where unused memory is not released because of improper pointer identification. This is not always a problem in practice unless the program handles a lot of data that could easily be misidentified as a pointer. False positives are generally less problematic on 64-bit systems than on 32-bit systems because the range of valid memory addresses tends to be a tiny fraction of the range of 64-bit values. Thus, an arbitrary 64-bit pattern is unlikely to mimic a valid pointer. A false negative can also happen if pointers are "hidden", for example using an XOR linked list. Whether a precise collector is practical usually depends on the type safety properties of the programming language in question. An example for which a conservative garbage collector would be needed is the C language, which allows typed (non-void) pointers to be type cast into untyped (void) pointers, and vice versa.

A related issue concerns internal pointers, or pointers to fields within an object. If the semantics of a language allow internal pointers, then there may be many different addresses that can refer to parts of the same object, which complicates determining whether an object is garbage or not. An example for this is the C++ language, in which multiple inheritance can cause pointers to base objects to have different addresses. In a tightly optimized program, the corresponding pointer to the object itself may have been overwritten in its register, so such internal pointers need to be scanned.

Performance

Performance of tracing garbage collectors – both latency and throughput – depends significantly on the implementation, workload, and environment. Naive implementations or use in very memory-constrained environments, notably embedded systems, can result in very poor performance compared with other methods, while sophisticated implementations and use in environments with ample memory can result in excellent performance. [ citation needed ]

In terms of throughput, tracing by its nature requires some implicit runtime overhead, though in some cases the amortized cost can be extremely low, in some cases even lower than one instruction per allocation or collection, outperforming stack allocation. [7] Manual memory management requires overhead due to explicit freeing of memory, and reference counting has overhead from incrementing and decrementing reference counts, and checking if the count has overflowed or dropped to zero. [ citation needed ]

In terms of latency, simple stop-the-world garbage collectors pause program execution for garbage collection, which can happen at arbitrary times and take arbitrarily long, making them unusable for real-time computing, notably embedded systems, and a poor fit for interactive use, or any other situation where low latency is a priority. However, incremental garbage collectors can provide hard real-time guarantees, and on systems with frequent idle time and sufficient free memory, such as personal computers, garbage collection can be scheduled for idle times and have minimal impact on interactive performance. Manual memory management (as in C++) and reference counting have a similar issue of arbitrarily long pauses in case of deallocating a large data structure and all its children, though these only occur at fixed times, not depending on garbage collection. [ citation needed ]

Manual heap allocation
Garbage collection

It is difficult to compare the two cases directly, as their behavior depends on the situation. For example, in the best case for a garbage collecting system, allocation just increments a pointer, but in the best case for manual heap allocation, the allocator maintains freelists of specific sizes and allocation only requires following a pointer. However, this size segregation usually cause a large degree of external fragmentation, which can have an adverse impact on cache behaviour. Memory allocation in a garbage collected language may be implemented using heap allocation behind the scenes (rather than simply incrementing a pointer), so the performance advantages listed above don't necessarily apply in this case. In some situations, most notably embedded systems, it is possible to avoid both garbage collection and heap management overhead by preallocating pools of memory and using a custom, lightweight scheme for allocation/deallocation. [8]

The overhead of write barriers is more likely to be noticeable in an imperative-style program which frequently writes pointers into existing data structures than in a functional-style program which constructs data only once and never changes them.

Some advances in garbage collection can be understood as reactions to performance issues. Early collectors were stop-the-world collectors, but the performance of this approach was distracting in interactive applications. Incremental collection avoided this disruption, but at the cost of decreased efficiency due to the need for barriers. Generational collection techniques are used with both stop-the-world and incremental collectors to increase performance; the trade-off is that some garbage is not detected as such for longer than normal.

Determinism

Real-time garbage collection

While garbage collection is generally nondeterministic, it is possible to use it in hard real-time systems. A real-time garbage collector should guarantee that even in the worst case it will dedicate a certain number of computational resources to mutator threads. Constraints imposed on a real-time garbage collector are usually either work based or time based. A time based constraint would look like: within each time window of duration T, mutator threads should be allowed to run at least for Tm time. For work based analysis, MMU (minimal mutator utilization) [9] is usually used as a real-time constraint for the garbage collection algorithm.

One of the first implementations of hard real-time garbage collection for the JVM was based on the Metronome algorithm, [10] whose commercial implementation is available as part of the IBM WebSphere Real Time. [11] Another hard real-time garbage collection algorithm is Staccato, available in the IBM's J9 JVM, which also provides scalability to large multiprocessor architectures, while bringing various advantages over Metronome and other algorithms which, on the contrary, require specialized hardware. [12]

One major challenge for real-time garbage collection on modern multi-core architectures is in designing a non-blocking concurrent garbage collection, not letting the concurrent threads block each other and create unpredictable pauses. A study of algorithms that allow non-blocking real-time concurrent garbage collection appears in a paper by Pizlo et al. in Microsoft Research. [13]

See also

Related Research Articles

<span class="mw-page-title-main">Garbage collection (computer science)</span> Form of automatic memory management

In computer science, garbage collection (GC) is a form of automatic memory management. The garbage collector attempts to reclaim memory which was allocated by the program, but is no longer referenced; such memory is called garbage. Garbage collection was invented by American computer scientist John McCarthy around 1959 to simplify manual memory management in Lisp.

In computer science, a memory leak is a type of resource leak that occurs when a computer program incorrectly manages memory allocations in a way that memory which is no longer needed is not released. A memory leak may also happen when an object is stored in memory but cannot be accessed by the running code. A memory leak has symptoms similar to a number of other problems and generally can only be diagnosed by a programmer with access to the program's source code.

In computer science, reference counting is a programming technique of storing the number of references, pointers, or handles to a resource, such as an object, a block of memory, disk space, and others.

Java and C++ are two prominent object-oriented programming languages. By many language popularity metrics, the two languages have dominated object-oriented and high-performance software development for much of the 21st century, and are often directly compared and contrasted. Java's syntax was based on C/C++.

<span class="mw-page-title-main">Memory management</span> Computer memory management methodology

Memory management is a form of resource management applied to computer memory. The essential requirement of memory management is to provide ways to dynamically allocate portions of memory to programs at their request, and free it for reuse when no longer needed. This is critical to any advanced computer system where more than a single process might be underway at any time.

In computer programming, a weak reference is a reference that does not protect the referenced object from collection by a garbage collector, unlike a strong reference. An object referenced only by weak references – meaning "every chain of references that reaches the object includes at least one weak reference as a link" – is considered weakly reachable, and can be treated as unreachable and so may be collected at any time. Some garbage-collected languages feature or support various levels of weak references, such as C#, Java, Lisp, OCaml, Perl, Python and PHP since the version 7.4.

In object-oriented programming (OOP), the object lifetime of an object is the time between an object's creation and its destruction. Rules for object lifetime vary significantly between languages, in some cases between implementations of a given language, and lifetime of a particular object may vary from one run of the program to another.

C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely malloc, realloc, calloc, aligned_alloc and free.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

<span class="mw-page-title-main">Dangling pointer</span> Pointer that does not point to a valid object

Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations. More generally, dangling references and wild references are references that do not resolve to a valid destination.

In computer programming, unreachable memory is a block of dynamically allocated memory where the program that allocated the memory no longer has any reachable pointer that refers to it. Similarly, an unreachable object is a dynamically allocated object that has no reachable reference to it. Informally, unreachable memory is dynamic memory that the program cannot reach directly, nor get to by starting at an object it can reach directly, and then following a chain of pointer references.

In computer storage, fragmentation is a phenomenon in which storage space, main storage or secondary storage, is used inefficiently, reducing capacity or performance and often both. The exact consequences of fragmentation depend on the specific system of storage allocation in use and the particular form of fragmentation. In many cases, fragmentation leads to storage space being "wasted", and in that case the term also refers to the wasted space itself.

In computer science, garbage includes data, objects, or other regions of the memory of a computer system, which will not be used in any future computation by the system, or by a program running on it. Because every computer system has a finite amount of memory, and most software produces garbage, it is frequently necessary to deallocate memory that is occupied by garbage and return it to the heap, or memory pool, for reuse.

In computer science, manual memory management refers to the usage of manual instructions by the programmer to identify and deallocate unused objects, or garbage. Up until the mid-1990s, the majority of programming languages used in industry supported manual memory management, though garbage collection has existed since 1959, when it was introduced with Lisp. Today, however, languages with garbage collection such as Java are increasingly popular and the languages Objective-C and Swift provide similar functionality through Automatic Reference Counting. The main manually managed languages still in widespread use today are C and C++ – see C dynamic memory allocation.

Cheney's algorithm, first described in a 1970 ACM paper by C.J. Cheney, is a stop and copy method of tracing garbage collection in computer software systems. In this scheme, the heap is divided into two equal halves, only one of which is in use at any one time. Garbage collection is performed by copying live objects from one semispace to the other, which then becomes the new heap. The entire old heap is then discarded in one piece. It is an improvement on the previous stop-and-copy technique.

In computer science, a mark–compact algorithm is a type of garbage collection algorithm used to reclaim unreachable memory. Mark–compact algorithms can be regarded as a combination of the mark–sweep algorithm and Cheney's copying algorithm. First, reachable objects are marked, then a compacting step relocates the reachable (marked) objects towards the beginning of the heap area. Compacting garbage collection is used by modern JVMs, Microsoft's Common Language Runtime and by the Glasgow Haskell Compiler.

Memory safety is the state of being protected from various software bugs and security vulnerabilities when dealing with memory access, such as buffer overflows and dangling pointers. For example, Java is said to be memory-safe because its runtime error detection checks array bounds and pointer dereferences. In contrast, C and C++ allow arbitrary pointer arithmetic with pointers implemented as direct memory addresses with no provision for bounds checking, and thus are potentially memory-unsafe.

In computer science, region-based memory management is a type of memory management in which each allocated object is assigned to a region. A region, also called a zone, arena, area, or memory context, is a collection of allocated objects that can be efficiently reallocated or deallocated all at once. Like stack allocation, regions facilitate allocation and deallocation of memory with low overhead; but they are more flexible, allowing objects to live longer than the stack frame in which they were allocated. In typical implementations, all objects in a region are allocated in a single contiguous range of memory addresses, similarly to how stack frames are typically allocated.

Automatic Reference Counting (ARC) is a memory management feature of the Clang compiler providing automatic reference counting for the Objective-C and Swift programming languages. At compile time, it inserts into the object code messages retain and release which increase and decrease the reference count at run time, marking for deallocation those objects when the number of references to them reaches zero.

Code cleanup refers to the act of writing code so that it cleans up leftover data structures and other unwanted materials from memory and the filesystem. It is sometimes treated as a synonym of refactoring code, which involves making the source code itself easier to understand, maintain, and modify.

References

  1. "Class SoftReference<T>". Java™ Platform Standard Ed. 7. Oracle. Retrieved 25 May 2013.
  2. "Class PhantomReference<T>". Java™ Platform Standard Ed. 7. Oracle. Retrieved 25 May 2013.
  3. "Class WeakReference<T>". Java™ Platform Standard Ed. 7. Oracle. Retrieved 25 May 2013.
  4. "Weak References". .NET Framework 4.5. Microsoft. Retrieved 25 May 2013.
  5. "Copying and Pinning". Microsoft Docs . Retrieved 2022-04-25.
  6. Steele, Guy L. (September 1975). "Multiprocessing Compactifying Garbage Collection". Communications of the ACM. 18 (9): 495–508. doi: 10.1145/361002.361005 . S2CID   29412244.
  7. Appel, Andrew W. (June 17, 1987). "Garbage collection can be faster than stack allocation" (PDF). Information Processing Letters. 25 (4): 275–279. CiteSeerX   10.1.1.49.2537 . doi:10.1016/0020-0190(87)90175-X. S2CID   2575400 . Retrieved 2022-04-25.
  8. Hopwood, David (January 1, 2007). "Memory allocation in embedded systems". cap-talk (Mailing list). EROS. Archived from the original on 2015-09-24.
  9. Cheng, Perry; Blelloch, Guy E. (June 22, 2001). "A Parallel, Real-Time Garbage Collector" (PDF). ACM SIGPLAN Notices. 36 (5): 125–136. doi:10.1145/381694.378823.
  10. Bacon, David F.; Cheng, Perry; Rajan, V. T. (November 2003). "The Metronome: A Simpler Approach to Garbage Collection in Real-Time Systems" (PDF). In Corsaro, Angelo; Cytron, Ron; Santoro, Corrado (eds.). Workshop on Java Technologies for Real-Time and Embedded Systems. JTRES'03. OTM 2003 Workshops. On The Move to Meaningful Internet Systems 2003. LNCS. Vol. 2889. pp. 466–478. CiteSeerX   10.1.1.3.8544 . doi:10.1007/978-3-540-39962-9_52. ISBN   3-540-20494-6. ISSN   0302-9743. S2CID   14565934. Archived from the original (PDF) on 2006-10-26.
  11. Biron, Benjamin; Sciampacone, Ryan (May 2, 2007). "Real-time Java, Part 4: Real-time garbage collection". IBM DeveloperWorks . Archived from the original on 2020-11-09.
  12. McCloskey, Bill; Bacon, David F.; Cheng, Perry; Grove, David (February 22, 2008). Staccato: A Parallel and Concurrent Real-time Compacting Garbage Collector for Multiprocessors (PDF) (Technical report). IBM Research Division. RC24504. Retrieved 2022-04-25.
  13. Pizlo, Phil; Petrank, Erez; Steensgaard, Bjarne (June 2008). Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PDF). PLDI 2008 Conferenece. pp. 33–44. CiteSeerX   10.1.1.3.8544 . doi:10.1145/1375581.1375587. ISBN   9781595938602.