Object resurrection

Last updated

In object-oriented programming languages with garbage collection, object resurrection occurs when an object becomes reachable (in other words, no longer garbage) during the process of object destruction, as a side effect of a finalizer being executed.

Contents

Object resurrection causes a number of problems, particularly that the possibility of object resurrection – even if it does not occur – makes garbage collection significantly more complicated and slower, and is a major reason that finalizers are discouraged. Languages deal with object resurrection in various ways. In rare circumstances, object resurrection is used to implement certain design patterns, notably an object pool, [1] while in other circumstances resurrection is an undesired bug caused by an error in finalizers, and in general resurrection is discouraged. [2]

Process

Object resurrection occurs via the following process. First, an object becomes garbage when it is no longer reachable from the program, and may be collected (destroyed and deallocated). Then, during object destruction, before the garbage collector deallocates the object, a finalizer method may be run, which may in turn make that object or another garbage object (reachable from the object with a finalizer) reachable again by creating references to it, as a finalizer may contain arbitrary code. If this happens, the referenced object – which is not necessarily the finalized object – is no longer garbage, and cannot be deallocated, as otherwise the references to it would become dangling references and cause errors when used, generally program crash or unpredictable behavior. Instead, in order to maintain memory safety, the object is returned to life or resurrected.

In order to detect this, a garbage collector will generally do two-phase collection in the presence of finalizers: first finalize any garbage that has a finalizer, and then re-check all garbage (or all garbage reachable from the objects with finalizers), in case the finalizers have resurrected some garbage. This adds overhead and delays memory reclamation.

Resurrected objects

A resurrected object may be treated the same as other objects, or may be treated specially. In many languages, notably C#, Java, and Python (from Python 3.4), objects are only finalized once, to avoid the possibility of an object being repeatedly resurrected or even being indestructible; in C# objects with finalizers by default are only finalized once, but can be re-registered for finalization. In other cases resurrected objects are considered errors, notably in Objective-C; or treated identically to other objects, notably in Python prior to Python 3.4.

A resurrected object is sometimes called a zombie object or zombie, but this term is used for various object states related to object destruction, with usage depending on language and author. A "zombie object" has a specialized meaning in Objective-C, however, which is detailed below. Zombie objects are somewhat analogous to zombie processes, in that they have undergone a termination state change and are close to deallocation, but the details are significantly different.

Variants

In the .NET Framework, notably C# and VB.NET, "object resurrection" instead refers to the state of an object during finalization: the object is brought back to life (from being inaccessible), the finalizer is run, and then returned to being inaccessible (and no longer is registered for future finalization). In .NET, which objects need finalization is not tracked object-by-object, but instead is stored in a finalization "queue", [lower-alpha 1] so rather than a notion of resurrected objects in the sense of this article, one speaks of objects "queued for finalization". Further, objects can be re-enqueued for finalization via GC.ReRegisterForFinalize, taking care to not multiply enqueue objects. [2]

Mechanism

There are two main ways that an object can resurrect itself or another object: by creating a reference to itself in an object that it can reach (garbage is not reachable, but garbage can reference non-garbage objects), or by creating a reference in the environment (global variables, or in some cases static variables or variables in a closure). Python examples of both follow, for an object resurrecting itself. It is also possible for an object to resurrect other objects if both are being collected in a given garbage collection cycle, by the same mechanisms.

Resurrects itself by creating a reference in an object it can reach:

classClingy:def__init__(self,ref=None)->None:self.ref=refdef__del__(self):ifself.ref:self.ref.ref=selfprint("Don't leave me!")a=Clingy(Clingy())# Create a 2-element linked list,# referenced by |a|a.ref.ref=a# Create a cyclea.ref=None# Clearing the reference from the first node# to the second makes the second garbagea.ref=None

Resurrects itself by creating a reference in the global environment:

c=NoneclassImmortal:def__del__(self):globalcc=selfprint("I'm not dead yet.")c=Immortal()c=None# Clearing |c| makes the object garbagec=None

In the above examples, in CPython prior to 3.4, these will run finalizers repeatedly, and the objects will not be garbage-collected, while in CPython 3.4 and later, the finalizers will only be called once, and the objects will be garbage-collected the second time they become unreachable.

Problems

Object resurrection causes a large number of problems.

Complicates garbage collection
The possibility of object resurrection means that the garbage collector must check for resurrected objects after finalization – even if it does not actually occur – which complicates and slows down garbage collection.
Indestructible objects
In some circumstances an object may be indestructible: if an object is resurrected in its own finalizer (or a group of objects resurrect each other as a result of their finalizers), and the finalizer is always called when destroying the object, then the object cannot be destroyed and its memory cannot be reclaimed.
Accidental resurrection and leaks
Thirdly, object resurrection may be unintentional, and the resulting object may be semantic garbage, hence never actually collected, causing a logical memory leak.
Inconsistent state and reinitialization
A resurrected object may be in an inconsistent state, or violate class invariants, due to the finalizer having been executed and causing an irregular state. Thus resurrected objects generally need to be manually reinitialized. [1]
Unique finalization or re-finalization
In some languages (such as Java and Python 3.4+) finalization is guaranteed to happen exactly once per object, so resurrected objects will not have their finalizers called; therefore, resurrected objects must execute any necessary cleanup code outside of the finalizer. In some other languages, the programmer can force finalization to be done repeatedly; notably, C# has GC.ReRegisterForFinalize. [1]

Solutions

Languages have adopted several different methods for coping with object resurrection, most commonly by having two-phase garbage collection in the presence of finalizers, to prevent dangling references; and by only finalizing objects once, particularly by marking objects as having been finalized (via a flag), to ensure that objects can be destroyed.

Java will not free the object until it has proven that the object is once again unreachable, but will not run the finalizer more than once. [3]

In Python, prior to Python 3.4, the standard CPython implementation would treat resurrected objects identically to other objects (which had never been finalized), making indestructible objects possible. [4] Further, it would not garbage collect cycles that contained an object with a finalizer, to avoid possible problems with object resurrection. Starting in Python 3.4, behavior is largely the same as Java: [lower-alpha 2] objects are only finalized once (being marked as "already finalized"), garbage collection of cycles is in two phases, with the second phase checking for resurrected objects. [5] [6]

Objective-C 2.0 will put resurrected objects into a "zombie" state, where they log all messages sent to them, but do nothing else. [7] See also Automatic Reference Counting: Zeroing Weak References for handling of weak references.

In the .NET Framework, notably C# and VB.NET, object finalization is determined by a finalization "queue", [lower-alpha 1] which is checked during object destruction. Objects with a finalizer are placed in this queue on creation, and dequeued when the finalizer is called, but can be manually dequeued (prior to finalization) with SuppressFinalize or re-enqueued with ReRegisterForFinalize. Thus by default objects with finalizers are finalized at most once, but this finalization can be suppressed, or objects can be finalized multiple times if they are resurrected (made accessible again) and then re-enqueued for finalization. Further, weak references by default do not track resurrection, meaning a weak reference is not updated if an object is resurrected; these are called short weak references, and weak references that track resurrection are called long weak references. [8]

Applications

Object resurrection is useful to handle an object pool of commonly used objects, but it obscures code and makes it more confusing. [3] It should be used only for objects that may be frequently used and where the construction/destruction of it is time-consuming. An example could be an array of random numbers, where a large number of them is created and destroyed in a short time, but where actually only a small number is in use at the same time. With object resurrection, a pooling technique would reduce the unnecessary overhead of creation and destruction. Here, a pool manager would get onto its object stack information in the form of a reference to the object, if it is currently to be destructed. The pool manager will keep the object for reuse later. [9]

See also

Notes

  1. 1 2 This is not strictly a queue, as elements can be removed from the middle by GC.SuppressFinalization.
  2. CPython uses reference counts for non-cyclic garbage, with a separate cycle detector, while most implementations of Java use a tracing garbage collector.

Related Research Articles

<span class="mw-page-title-main">Garbage collection (computer science)</span> Form of automatic memory management

In computer science, garbage collection (GC) is a form of automatic memory management. The garbage collector attempts to reclaim memory which was allocated by the program, but is no longer referenced; such memory is called garbage. Garbage collection was invented by American computer scientist John McCarthy around 1959 to simplify manual memory management in Lisp.

In computer science, reference counting is a programming technique of storing the number of references, pointers, or handles to a resource, such as an object, a block of memory, disk space, and others.

Java Platform, Standard Edition is a computing platform for development and deployment of portable code for desktop and server environments. Java SE was formerly known as Java 2 Platform, Standard Edition (J2SE).

Java and C++ are two prominent object-oriented programming languages. By many language popularity metrics, the two languages have dominated object-oriented and high-performance software development for much of the 21st century, and are often directly compared and contrasted. Java's syntax was based on C/C++.

In computer programming, a reference is a value that enables a program to indirectly access a particular datum, such as a variable's value or a record, in the computer's memory or in some other storage device. The reference is said to refer to the datum, and accessing the datum is called dereferencing the reference. A reference is distinct from the datum itself.

In computer programming, a weak reference is a reference that does not protect the referenced object from collection by a garbage collector, unlike a strong reference. An object referenced only by weak references – meaning "every chain of references that reaches the object includes at least one weak reference as a link" – is considered weakly reachable, and can be treated as unreachable and so may be collected at any time. Some garbage-collected languages feature or support various levels of weak references, such as C#, Java, Lisp, OCaml, Perl, Python and PHP since the version 7.4.

In object-oriented programming (OOP), the object lifetime of an object is the time between an object's creation and its destruction. Rules for object lifetime vary significantly between languages, in some cases between implementations of a given language, and lifetime of a particular object may vary from one run of the program to another.

Resource acquisition is initialization (RAII) is a programming idiom used in several object-oriented, statically typed programming languages to describe a particular language behavior. In RAII, holding a resource is a class invariant, and is tied to object lifetime. Resource allocation is done during object creation, by the constructor, while resource deallocation (release) is done during object destruction, by the destructor. In other words, resource acquisition must succeed for initialization to succeed. Thus the resource is guaranteed to be held between when initialization finishes and finalization starts, and to be held only when the object is alive. Thus if there are no object leaks, there are no resource leaks.

In computer science, a finalizer or finalize method is a special method that performs finalization, generally some form of cleanup. A finalizer is executed during object destruction, prior to the object being deallocated, and is complementary to an initializer, which is executed during object creation, following allocation. Finalizers are strongly discouraged by some, due to difficulty in proper use and the complexity they add, and alternatives are suggested instead, mainly the dispose pattern.

C++/CLI is a variant of the C++ programming language, modified for Common Language Infrastructure. It has been part of Visual Studio 2005 and later, and provides interoperability with other .NET languages such as C#. Microsoft created C++/CLI to supersede Managed Extensions for C++. In December 2005, Ecma International published C++/CLI specifications as the ECMA-372 standard.

The object pool pattern is a software creational design pattern that uses a set of initialized objects kept ready to use – a "pool" – rather than allocating and destroying them on demand. A client of the pool will request an object from the pool and perform operations on the returned object. When the client has finished, it returns the object to the pool rather than destroying it; this can be done manually or automatically.

In computer programming, unreachable memory is a block of dynamically allocated memory where the program that allocated the memory no longer has any reachable pointer that refers to it. Similarly, an unreachable object is a dynamically allocated object that has no reachable reference to it. Informally, unreachable memory is dynamic memory that the program cannot reach directly, nor get to by starting at an object it can reach directly, and then following a chain of pointer references.

In computer science, garbage includes data, objects, or other regions of the memory of a computer system, which will not be used in any future computation by the system, or by a program running on it. Because every computer system has a finite amount of memory, and most software produces garbage, it is frequently necessary to deallocate memory that is occupied by garbage and return it to the heap, or memory pool, for reuse.

In computer science, manual memory management refers to the usage of manual instructions by the programmer to identify and deallocate unused objects, or garbage. Up until the mid-1990s, the majority of programming languages used in industry supported manual memory management, though garbage collection has existed since 1959, when it was introduced with Lisp. Today, however, languages with garbage collection such as Java are increasingly popular and the languages Objective-C and Swift provide similar functionality through Automatic Reference Counting. The main manually managed languages still in widespread use today are C and C++ – see C dynamic memory allocation.

A phantom reference is a kind of reference in Java, where the memory can be reclaimed. The phantom reference is one of the strengths or levels of 'non strong' reference defined in the Java programming language; the others being weak and soft. Phantom reference are the weakest level of reference in Java; in order from strongest to weakest, they are: strong, soft, weak, phantom.

An ephemeron is a data structure that solves two related problems in garbage collected systems. On the one hand, an ephemeron provides a notification when some object is about to be collected. On the other hand, an ephemeron allows data to be associated with some object without creating a reference to that object that will prevent the object from being collected. An ephemeron is a key-value pair, where the key is the object that the ephemeron guards, notifying the system when that object is collectable, and the value can be any data associated with the object such as a property list, and which may be empty. Since the elements of the property list may refer back to the key, they may prevent collection of that key. But the ephemeron is treated specially by the garbage collector. The value field is not traced until the key is found to be reachable from the system roots other than through ephemeron keys. The set of ephemerons whose keys are only reachable from ephemeron keys are then holding onto keys that are ready to be collected; these objects are not reachable from the roots except through ephemerons. When the garbage collector detects such a set, the ephemerons are queued for notification and their keys and values are traced. Hence ephemerons both detect objects that are ready for collection and break the cycles that can prevent objects from being collected.

This comparison of programming languages compares how object-oriented programming languages such as C++, Java, Smalltalk, Object Pascal, Perl, Python, and others manipulate data structures.

Automatic Reference Counting (ARC) is a memory management feature of the Clang compiler providing automatic reference counting for the Objective-C and Swift programming languages. At compile time, it inserts into the object code messages retain and release which increase and decrease the reference count at run time, marking for deallocation those objects when the number of references to them reaches zero.

In computer programming, resource management refers to techniques for managing resources.

<span class="mw-page-title-main">Nim (programming language)</span> Programming language

Nim is a general-purpose, multi-paradigm, statically typed, compiled high-level systems programming language, designed and developed by a team around Andreas Rumpf. Nim is designed to be "efficient, expressive, and elegant", supporting metaprogramming, functional, message passing, procedural, and object-oriented programming styles by providing several features such as compile time code generation, algebraic data types, a foreign function interface (FFI) with C, C++, Objective-C, and JavaScript, and supporting compiling to those same languages as intermediate representations.

References

  1. 1 2 3 Goldshtein, Zurbalev & Flatow 2012, p.  129.
  2. 1 2 Richter 2000.
  3. 1 2 "What is resurrection (in garbage collection)?". XYZWS. Archived from the original on 2011-11-23. Retrieved 2011-08-01. An object that has been eligible for garbage collection may stop being eligible and return to normal life. Within a finalize() method, you can assign this to a reference variable and prevent that object's collection, an act many developers call resurrection. /The finalize() method is never called more than once by the JVM for any given object. The JVM will not invoke finalize() method again after resurrection (as the finalize() method already ran for that object).
  4. Tim Peters's answer to "How many times can `__del__` be called per object in Python?"
  5. What's New In Python 3.4, PEP 442: Safe Object Finalization
  6. Pitrou, Antoine (2013). "PEP 442 -- Safe object finalization".
  7. Implementing a finalize Method
  8. Goldshtein, Zurbalev & Flatow 2012, p.  131.
  9. "Object resurrection" (PDF). Hesab.net. Retrieved 2011-08-01. Object resurrection is an advanced technique that's likely to be useful only in unusual scenarios, such as when you're implementing a pool of objects whose creation and destruction is time-consuming. ... The ObjectPool demo application shows that an object pool manager can improve performance when many objects are frequently created and destroyed. Assume that you have a RandomArray class, which encapsulates an array of random numbers. The main program creates and destroys thousands of RandomArray objects, even though only a few objects are alive at a given moment. Because the class creates the random array in its constructor method (a timeconsuming operation), this situation is ideal for a pooling technique. ... The crucial point in the pooling technique is that the PoolManager class contains a reference to unused objects in the pool (in the PooledObjects Stack object), but not to objects being used by the main program. In fact, the latter objects are kept alive only by references in the main program. When the main program sets a RandomArray object to Nothing (or lets it go out of scope) and a garbage collection occurs, the garbage collector invokes the object's Finalize method. The code inside the RandomArray's Finalize method has therefore an occasion to resurrect itself by storing a reference to itself in the PoolManager's PooledObjects structure. So when the NewRandomArray function is called again, the PoolManager object can return a pooled object to the client without going through the time-consuming process of creating a new one.