Ephemeron

Last updated

An ephemeron is a data structure that solves two related problems in garbage collected systems. On the one hand, an ephemeron provides a notification when some object is about to be collected. On the other hand, an ephemeron allows data to be associated with some object without creating a reference to that object that will prevent the object from being collected. An ephemeron is a key-value pair, where the key is the object that the ephemeron guards, notifying the system when that object is collectable, and the value can be any data associated with the object such as a property list, and which may be empty. Since the elements of the property list may refer back to the key, they may prevent collection of that key. But the ephemeron is treated specially by the garbage collector. The value field is not traced until the key is found to be reachable from the system roots other than through ephemeron keys. The set of ephemerons whose keys are only reachable from ephemeron keys are then holding onto keys that are ready to be collected; these objects are not reachable from the roots except through ephemerons. When the garbage collector detects such a set, the ephemerons are queued for notification and their keys and values are traced. Hence ephemerons both detect objects that are ready for collection and break the cycles that can prevent objects from being collected.

Contents

Description

In computer science, finalization occurs when a garbage collector (GC) informs an application that an object is "almost collectable". It is used to help an application maintain its invariants. Weak references may be used by a garbage collector to determine the objects that are almost collectable. Seen both as key-value pairs, the main difference between weak references and an ephemerons is the way the garbage collector treats them. For weak references, the garbage collector always follows the value in the key-value pair. For ephemerons, instead, the garbage collector doesn't follow the value but queues the ephemeron for further observation at a second stage: after the first tracing phase is done, it runs through the queue looking at each ephemeron and if its key was seen, then it follows its value. This subtle difference impacts in graphs with some kinds of cycles, where weak pairs do not describe correctly that an object ought to be "almost collectable". For example, consider a key-value pair with weak references where the key is an object and the value is a set of properties attached to the object. It is expected that when the object is ready to be collected, the properties will also go away. But if the value, possibly transitively, maps to its own key (the object), then the object will never be collected. If an ephemeron was used instead, the value wouldn't have been followed unless the object was proved alive, solving the cycle. Ephemerons are similar to weak pairs, but an object in an ephemeron's key field may be classed as "almost collectable" even if it is reachable from the ephemeron's value fields. [1]

Uses

An ephemeron is an object which refers strongly to its contents as long as the ephemeron's key is not garbage collected, and weakly from then on. Ephemerons solve a problem which is commonly found when trying to "attach" properties to objects by using a registry. When some property should be attached to an object, the property should (in terms of GC behavior) typically have the life-time that an instance variable of this object would have. However, this is complicated by having an external association between the object and its property such as:

property --------- registry --------- association --------- object

Here, the registry (a third party) will hold onto the association itself which would require manual removal from the registry (instead of automated garbage collection). While this problem can always be solved in any given concrete situation by using one of the various weak association types, choosing the 'right' kind of association depends on a variety of factors some of which can change dynamically.

Ephemerons solve this problem by defining that the 'contents' (value) of an ephemeron will be held strongly until the key is known to be garbage collected. From then on, the contents of the ephemeron will be held weakly. Therefore, the contents of an ephemeron can become eligible for garbage collection if and only if the key is garbage collectable which is the exact behavior which we would observe for an instance variable of the object.

History

Ephemerons were first invented by George Bosworth while he worked at Digitalk. [1] They were used as the finalization mechanism in Visual Smalltalk Enterprise. Today ephemerons are available in most Smalltalk dialects as well as many other languages with automatic garbage collection.

Examples of use

Smalltalk

Several dialects of Smalltalk include ephemerons as built-in features or as additional packages. For example, GNU Smalltalk [2] and Squeak. [3]

Lua

Lua does not contain a separate ephemeron construct, but its table data structures may be set to holds its keys, values, or both in a weak fashion. If the keys are held weakly, but values are held strongly, the table will act like an ephemeron. Lua 5.4 also introduces metatable behavior that helps to construct ephemeron-like data structures. [4]

.NET

Languages such as C#, F#, and VB.NET, as of .NET Framework 4.0, have support in the ConditionalWeakTable class. [5] The underlying ephemeron mechanism (DependentHandle) used to be private until .NET 6.

OCaml

An implementation of an OCaml ephemeron type was presented in 2014 [6] and added to the standard library in release 4.03. [7]

Racket

The Racket dialect of Lisp has support for ephemerons in its runtime system. There, ephemerons are used in combination with weak mappings to allow the garbage collector to free key-value pairs even if the value holds a reference to a key. [8]

Scheme SRFI

A SRFI (Scheme Request for Implementation) defines an API for Ephemerons for the Scheme language. [9] However, not all Scheme implementations support all SRFIs.

Related Research Articles

<span class="mw-page-title-main">Garbage collection (computer science)</span> Form of automatic memory management

In computer science, garbage collection (GC) is a form of automatic memory management. The garbage collector attempts to reclaim memory which was allocated by the program, but is no longer referenced; such memory is called garbage. Garbage collection was invented by American computer scientist John McCarthy around 1959 to simplify manual memory management in Lisp.

In computer science, reference counting is a programming technique of storing the number of references, pointers, or handles to a resource, such as an object, a block of memory, disk space, and others.

In computing, serialization is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of object-oriented objects does not include any of their associated methods with which they were previously linked.

Java Platform, Standard Edition is a computing platform for development and deployment of portable code for desktop and server environments. Java SE was formerly known as Java 2 Platform, Standard Edition (J2SE).

In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of pairs, such that each possible key appears at most once in the collection. In mathematical terms, an associative array is a function with finite domain. It supports 'lookup', 'remove', and 'insert' operations.

In computer programming, a reference is a value that enables a program to indirectly access a particular datum, such as a variable's value or a record, in the computer's memory or in some other storage device. The reference is said to refer to the datum, and accessing the datum is called dereferencing the reference. A reference is distinct from the datum itself.

In computer programming, a weak reference is a reference that does not protect the referenced object from collection by a garbage collector, unlike a strong reference. An object referenced only by weak references – meaning "every chain of references that reaches the object includes at least one weak reference as a link" – is considered weakly reachable, and can be treated as unreachable and so may be collected at any time. Some garbage-collected languages feature or support various levels of weak references, such as C#, Java, Lisp, OCaml, Perl, Python and PHP since the version 7.4.

In object-oriented programming (OOP), the object lifetime of an object is the time between an object's creation and its destruction. Rules for object lifetime vary significantly between languages, in some cases between implementations of a given language, and lifetime of a particular object may vary from one run of the program to another.

In computer programming, tracing garbage collection is a form of automatic memory management that consists of determining which objects should be deallocated by tracing which objects are reachable by a chain of references from certain "root" objects, and considering the rest as "garbage" and collecting them. Tracing garbage collection is the most common type of garbage collection – so much so that "garbage collection" often refers to tracing garbage collection, rather than other methods such as reference counting – and there are a large number of algorithms used in implementation.

In computer science, string interning is a method of storing only one copy of each distinct string value, which must be immutable. Interning strings makes some string processing tasks more time-efficient or space-efficient at the cost of requiring more time when the string is created or interned. The distinct values are stored in a string intern pool.

In computer science, a finalizer or finalize method is a special method that performs finalization, generally some form of cleanup. A finalizer is executed during object destruction, prior to the object being deallocated, and is complementary to an initializer, which is executed during object creation, following allocation. Finalizers are strongly discouraged by some, due to difficulty in proper use and the complexity they add, and alternatives are suggested instead, mainly the dispose pattern.

Managed Extensions for C++ or Managed C++ is a deprecated set of language extensions for C++, including grammatical and syntactic extensions, keywords and attributes, to bring the C++ syntax and language to the .NET Framework. These extensions were created by Microsoft to allow C++ code to be targeted to the Common Language Runtime (CLR) in the form of managed code, as well as continue to interoperate with native code.

C++/CLI is a variant of the C++ programming language, modified for Common Language Infrastructure. It has been part of Visual Studio 2005 and later, and provides interoperability with other .NET languages such as C#. Microsoft created C++/CLI to supersede Managed Extensions for C++. In December 2005, Ecma International published C++/CLI specifications as the ECMA-372 standard.

The Boehm–Demers–Weiser garbage collector, often simply known as Boehm GC, is a conservative garbage collector for C and C++ developed by Hans Boehm, Alan Demers, and Mark Weiser.

In computer science, a container is a class or a data structure whose instances are collections of other objects. In other words, they store objects in an organized way that follows specific access rules.

A phantom reference is a kind of reference in Java, where the memory can be reclaimed. The phantom reference is one of the strengths or levels of 'non strong' reference defined in the Java programming language; the others being weak and soft. Phantom reference are the weakest level of reference in Java; in order from strongest to weakest, they are: strong, soft, weak, phantom.

This comparison of programming languages compares how object-oriented programming languages such as C++, Java, Smalltalk, Object Pascal, Perl, Python, and others manipulate data structures.

In object-oriented programming languages with garbage collection, object resurrection occurs when an object becomes reachable during the process of object destruction, as a side effect of a finalizer being executed.

Objective-C is a high-level general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXTSTEP operating system. Due to Apple macOS’s direct lineage from NeXTSTEP, Objective-C was the standard programming language used, supported, and promoted by Apple for developing macOS and iOS applications until the introduction of the Swift programming language in 2014.

References

  1. 1 2 Barry Hayes (1997). "Ephemerons: A New Finalization Mechanism". Object-Oriented Languages, Programming, Systems, and Applications.
  2. "Special objects - GNU Smalltalk User's Guide" . Retrieved 20 February 2013.
  3. "Ephemerons" . Retrieved 20 February 2013.
  4. "Lua 5.4 Reference Manual, §2.5.3" . Retrieved 30 January 2022.
  5. ".NET 4.0 - System.Runtime.CompilerServices.ConditionalWeakTable". IKVM.NET Weblog. Retrieved 14 October 2013.
  6. Bobot, François. "Ephemerons meet OCaml GC" (PDF). OCaml Users and Developers Workshop 2014. Archived from the original (PDF) on 19 November 2016. Retrieved 5 April 2018.
  7. Minsky, Yaron. "OCaml 4.03: Everything else". Jane Street Tech Blog. Retrieved 5 April 2018.
  8. "15.2 Ephemerons" . Retrieved 20 February 2013.
  9. "SRFI-124: Ephemerons".