Cheney's algorithm

Last updated

Cheney's algorithm, first described in a 1970 ACM paper by C.J. Cheney, is a stop and copy method of tracing garbage collection in computer software systems. In this scheme, the heap is divided into two equal halves, only one of which is in use at any one time. Garbage collection is performed by copying live objects from one semispace (the from-space) to the other (the to-space), which then becomes the new heap. The entire old heap is then discarded in one piece. It is an improvement on the previous stop-and-copy technique.[ citation needed ]

Contents

Cheney's algorithm reclaims items as follows:

Once all to-space references have been examined and updated, garbage collection is complete.

The algorithm needs no stack and only two pointers outside of the from-space and to-space: a pointer to the beginning of free space in the to-space, and a pointer to the next word in to-space that needs to be examined. The data between the two pointers represents work remaining for it to do (those objects are gray in the tri-color terminology, see later).

The forwarding pointer (sometimes called a "broken heart") is used only during the garbage collection process; when a reference to an object already in to-space (thus having a forwarding pointer in from-space) is found, the reference can be updated quickly simply by updating its pointer to match the forwarding pointer.

Because the strategy is to exhaust all live references, and then all references in referenced objects, this is known as a breadth-first list copying garbage collection scheme.

Sample algorithm

initialize() =     tospace   = N/2     fromspace = 0     allocPtr  = fromspace     scanPtr   = whatever -- only used during collection 
allocate(n) =     If allocPtr + n > fromspace + N/2         collect()     EndIf     If allocPtr + n > fromspace + N/2         fail “insufficient memory”     EndIf     o = allocPtr     allocPtr = allocPtr + n     return o 
collect() =     swap(fromspace, tospace)     allocPtr = fromspace     scanPtr = fromspace      -- scan every root you've got     ForEach root in the stack -- or elsewhere         root = copy(root)     EndForEach          -- scan objects in the to-space (including objects added by this loop)     While scanPtr < allocPtr         ForEach reference r from o (pointed to by scanPtr)             r = copy(r)         EndForEach         scanPtr = scanPtr + o.size() -- points to the next object in the to-space, if any     EndWhile 
copy(o) =     If o has no forwarding address         o' = allocPtr         allocPtr = allocPtr + size(o)         copy the contents of o to o'         forwarding-address(o) = o'     EndIf     return forwarding-address(o) 

Semispace

Cheney based his work on the semispace garbage collector, which was published a year earlier by R.R. Fenichel and J.C. Yochelson.

Equivalence to tri-color abstraction

Cheney's algorithm is an example of a tri-color marking garbage collector. The first member of the gray set is the stack itself. Objects referenced on the stack are copied into the to-space, which contains members of the black and gray sets.

The algorithm moves any white objects (equivalent to objects in the from-space without forwarding pointers) to the gray set by copying them to the to-space. Objects that are between the scanning pointer and the free-space pointer on the to-space area are members of the gray set still to be scanned. Objects below the scanning pointer belong to the black set. Objects are moved to the black set by simply moving the scanning pointer over them.

When the scanning pointer reaches the free-space pointer, the gray set is empty, and the algorithm ends.

Related Research Articles

<span class="mw-page-title-main">Garbage collection (computer science)</span> Form of automatic memory management

In computer science, garbage collection (GC) is a form of automatic memory management. The garbage collector attempts to reclaim memory which was allocated by the program, but is no longer referenced; such memory is called garbage. Garbage collection was invented by American computer scientist John McCarthy around 1959 to simplify manual memory management in Lisp.

In computer science, reference counting is a programming technique of storing the number of references, pointers, or handles to a resource, such as an object, a block of memory, disk space, and others.

In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory. On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process, and sometimes a core dump.

Java and C++ are two prominent object-oriented programming languages. By many language popularity metrics, the two languages have dominated object-oriented and high-performance software development for much of the 21st century, and are often directly compared and contrasted. Java’s syntax was based on C/C++.

In computer programming, a weak reference is a reference that does not protect the referenced object from collection by a garbage collector, unlike a strong reference. An object referenced only by weak references – meaning "every chain of references that reaches the object includes at least one weak reference as a link" – is considered weakly reachable, and can be treated as unreachable and so may be collected at any time. Some garbage-collected languages feature or support various levels of weak references, such as C#, Java, Lisp, OCaml, Perl, Python and PHP since the version 7.4.

In computer science, a smart pointer is an abstract data type that simulates a pointer while providing added features, such as automatic memory management or bounds checking. Such features are intended to reduce bugs caused by the misuse of pointers, while retaining efficiency. Smart pointers typically keep track of the memory they point to, and may also be used to manage other resources, such as network connections and file handles. Smart pointers were first popularized in the programming language C++ during the first half of the 1990s as rebuttal to criticisms of C++'s lack of automatic garbage collection.

C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely malloc, realloc, calloc, aligned_alloc and free.

In computer programming, tracing garbage collection is a form of automatic memory management that consists of determining which objects should be deallocated by tracing which objects are reachable by a chain of references from certain "root" objects, and considering the rest as "garbage" and collecting them. Tracing garbage collection is the most common type of garbage collection – so much so that "garbage collection" often refers to tracing garbage collection, rather than other methods such as reference counting – and there are a large number of algorithms used in implementation.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

Resource acquisition is initialization (RAII) is a programming idiom used in several object-oriented, statically-typed programming languages to describe a particular language behavior. In RAII, holding a resource is a class invariant, and is tied to object lifetime. Resource allocation is done during object creation, by the constructor, while resource deallocation (release) is done during object destruction, by the destructor. In other words, resource acquisition must succeed for initialization to succeed. Thus the resource is guaranteed to be held between when initialization finishes and finalization starts, and to be held only when the object is alive. Thus if there are no object leaks, there are no resource leaks.

<span class="mw-page-title-main">Dangling pointer</span> Pointer that does not point to a valid object

Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations. More generally, dangling references and wild references are references that do not resolve to a valid destination.

In computer science, garbage includes data, objects, or other regions of the memory of a computer system, which will not be used in any future computation by the system, or by a program running on it. Because every computer system has a finite amount of memory, and most software produces garbage, it is frequently necessary to deallocate memory that is occupied by garbage and return it to the heap, or memory pool, for reuse.

In computer science, manual memory management refers to the usage of manual instructions by the programmer to identify and deallocate unused objects, or garbage. Up until the mid-1990s, the majority of programming languages used in industry supported manual memory management, though garbage collection has existed since 1959, when it was introduced with Lisp. Today, however, languages with garbage collection such as Java are increasingly popular and the languages Objective-C and Swift provide similar functionality through Automatic Reference Counting. The main manually managed languages still in widespread use today are C and C++ – see C dynamic memory allocation.

<span class="mw-page-title-main">Chicken (Scheme implementation)</span> Scheme-based programming language implementation

Chicken is a programming language, specifically a compiler and interpreter which implement a dialect of the programming language Scheme, and which compiles Scheme source code to standard C. It is mostly R5RS compliant and offers many extensions to the standard. The newer R7RS standard is supported through an extension library. Chicken is free and open-source software available under a BSD license. It is implemented mostly in Scheme, with some parts in C for performance or to make embedding into C programs easier.

In compiler optimization, escape analysis is a method for determining the dynamic scope of pointers – where in the program a pointer can be accessed. It is related to pointer analysis and shape analysis.

In computer science, pointer analysis, or points-to analysis, is a static code analysis technique that establishes which pointers, or heap references, can point to which variables, or storage locations. It is often a component of more complex analyses such as escape analysis. A closely related technique is shape analysis.

In computer science, a mark–compact algorithm is a type of garbage collection algorithm used to reclaim unreachable memory. Mark–compact algorithms can be regarded as a combination of the mark–sweep algorithm and Cheney's copying algorithm. First, reachable objects are marked, then a compacting step relocates the reachable (marked) objects towards the beginning of the heap area. Compacting garbage collection is used by modern JVMs, Microsoft's Common Language Runtime and by the Glasgow Haskell Compiler.

Memory safety is the state of being protected from various software bugs and security vulnerabilities when dealing with memory access, such as buffer overflows and dangling pointers. For example, Java is said to be memory-safe because its runtime error detection checks array bounds and pointer dereferences. In contrast, C and C++ allow arbitrary pointer arithmetic with pointers implemented as direct memory addresses with no provision for bounds checking, and thus are potentially memory-unsafe.

In multithreaded computing, the ABA problem occurs during synchronization, when a location is read twice, has the same value for both reads, and the read value being the same twice is used to conclude that nothing has happened in the interim; however, another thread can execute between the two reads and change the value, do other work, then change the value back, thus fooling the first thread into thinking nothing has changed even though the second thread did work that violates that assumption.

In computer science, region-based memory management is a type of memory management in which each allocated object is assigned to a region. A region, also called a zone, arena, area, or memory context, is a collection of allocated objects that can be efficiently reallocated or deallocated all at once. Like stack allocation, regions facilitate allocation and deallocation of memory with low overhead; but they are more flexible, allowing objects to live longer than the stack frame in which they were allocated. In typical implementations, all objects in a region are allocated in a single contiguous range of memory addresses, similarly to how stack frames are typically allocated.

References