Finalizer

Last updated

In computer science, a finalizer or finalize method is a special method that performs finalization, generally some form of cleanup. A finalizer is executed during object destruction, prior to the object being deallocated, and is complementary to an initializer, which is executed during object creation, following allocation. Finalizers are strongly discouraged by some, due to difficulty in proper use and the complexity they add, and alternatives are suggested instead, mainly the dispose pattern [1] (see problems with finalizers).

Contents

The term finalizer is mostly used in object-oriented and functional programming languages that use garbage collection, of which the archetype is Smalltalk. This is contrasted with a destructor , which is a method called for finalization in languages with deterministic object lifetimes, archetypically C++. [2] [3] These are generally exclusive: a language will have either finalizers (if automatically garbage collected) or destructors (if manually memory managed), but in rare cases a language may have both, as in C++/CLI and D, and in case of reference counting (instead of tracing garbage collection), terminology varies. In technical use, finalizer may also be used to refer to destructors, as these also perform finalization, and some subtler distinctions are drawn – see terminology. The term final also indicates a class that cannot be inherited; this is unrelated.

Terminology

The terminology of finalizer and finalization versus destructor and destruction varies between authors and is sometimes unclear.

In common use, a destructor is a method called deterministically on object destruction, and the archetype is C++ destructors; while a finalizer is called non-deterministically by the garbage collector, and the archetype is Java finalize methods.

For languages that implement garbage collection via reference counting, terminology varies, with some languages such as Objective-C and Perl using destructor, and other languages such as Python using finalizer (per spec, Python is garbage collected, but the reference CPython implementation since its version 2.0 uses a combination of reference counting and garbage collection). This reflects that reference counting results in semi-deterministic object lifetime: for objects that are not part of a cycle, objects are destroyed deterministically when the reference count drops to zero, but objects that are part of a cycle are destroyed non-deterministically, as part of a separate form of garbage collection.

In certain narrow technical use, constructor and destructor are language-level terms, meaning methods defined in a class, while initializer and finalizer are implementation-level terms, meaning methods called during object creation or destruction. Thus for example the original specification for the C# language referred to "destructors", even though C# is garbage-collected, but the specification for the Common Language Infrastructure (CLI), and the implementation of its runtime environment as the Common Language Runtime (CLR), referred to "finalizers". This is reflected in the C# language committee's notes, which read in part: "The C# compiler compiles destructors to ... [probably] instance finalizer[s]". [4] [5] This terminology is confusing, and thus more recent versions of the C# spec refer to the language-level method as "finalizers". [6]

Another language that does not make this terminology distinction is D. Although D classes are garbage collected, their cleanup functions are called destructors. [7]

Use

Finalization is mostly used for cleanup, to release memory or other resources: to deallocate memory allocated via manual memory management; to clear references if reference counting is used (decrement reference counts); to release resources, particularly in the resource acquisition is initialization (RAII) idiom; or to unregister an object. The amount of finalization varies significantly between languages, from extensive finalization in C++, which has manual memory management, reference counting, and deterministic object lifetimes; to often no finalization in Java, which has non-deterministic object lifetimes and is often implemented with a tracing garbage collector. It is also possible for there to be little or no explicit (user-specified) finalization, but significant implicit finalization, performed by the compiler, interpreter, or runtime; this is common in case of automatic reference counting, as in the CPython reference implementation of Python, or in Automatic Reference Counting in Apple's implementation of Objective-C, which both automatically break references during finalization. A finalizer can include arbitrary code; a particularly complex use is to automatically return the object to an object pool.

Memory deallocation during finalization is common in languages like C++ where manual memory management is standard, but also occurs in managed languages when memory has been allocated outside of the managed heap (externally to the language); in Java this occurs with Java Native Interface (JNI) and ByteBuffer objects in New I/O (NIO). This latter can cause problems due to the garbage collector being unable to track these external resources, so they will not be collected aggressively enough, and can cause out-of-memory errors due to exhausting unmanaged memory – this can be avoided by treating native memory as a resource and using the dispose pattern, as discussed below.

Finalizers are generally both much less necessary and much less used than destructors. They are much less necessary because garbage collection automates memory management, and much less used because they are not generally executed deterministically – they may not be called in a timely manner, or even at all, and the execution environment cannot be predicted – and thus any cleanup that must be done in a deterministic way must instead be done by some other method, most frequently manually via the dispose pattern. Notably, both Java and Python do not guarantee that finalizers will ever be called, and thus they cannot be relied on for cleanup.

Due to the lack of programmer control over their execution, it is usually recommended to avoid finalizers for any but the most trivial operations. In particular, operations often performed in destructors are not usually appropriate for finalizers. A common anti-pattern is to write finalizers as if they were destructors, which is both unnecessary and ineffectual, due to differences between finalizers and destructors. This is particularly common among C++ programmers, as destructors are heavily used in idiomatic C++, following the resource acquisition is initialization (RAII) idiom.

Syntax

Programming languages that use finalizers include C++/CLI, C#, Clean, Go, Java, JavaScript and Python. Syntax varies significantly by language.

In Java, a finalizer is a method called finalize, which overrides the Object.finalize method. [8]

In JavaScript, FinalizationRegistry allows you to request a callback when an object is garbage-collected.

In Python, a finalizer is a method called __del__.

In Perl, a finalizer is a method called DESTROY.

UML class in C# containing a constructor and a finalizer. Csharp finalizer.svg
UML class in C# containing a constructor and a finalizer.

In C#, a finalizer (called "destructor" in earlier versions of the standard) is a method whose name is the class name with ~ prefixed, as in ~Foo – this is the same syntax as a C++ destructor, and these methods were originally called "destructors", by analogy with C++, despite having different behavior, but were renamed to "finalizers" due to the confusion this caused. [6]

In C++/CLI, which has both destructors and finalizers, a destructor is a method whose name is the class name with ~ prefixed, as in ~Foo (as in C#), and a finalizer is a method whose name is the class name with ! prefixed, as in !Foo.

In Go finalizers are applied to a single pointer by calling the runtime.SetFinalizer function in the standard library. [9]

Implementation

A finalizer is called when an object is garbage collected – after an object has become garbage (unreachable), but before its memory is deallocated. Finalization occurs non-deterministically, at the discretion of the garbage collector, and might never occur. This contrasts with destructors, which are called deterministically as soon as an object is no longer in use, and are always called, except in case of uncontrolled program termination. Finalizers are most frequently instance methods, due to needing to do object-specific operations.

The garbage collector must also account for the possibility of object resurrection. Most commonly this is done by first executing finalizers, then checking whether any objects have been resurrected, and if so, aborting their destruction. This additional check is potentially expensive – a simple implementation re-checks all garbage if even a single object has a finalizer – and thus both slows down and complicates garbage collection. For this reason, objects with finalizers may be collected less frequently than objects without finalizers (only on certain cycles), exacerbating problems caused by relying on prompt finalization, such as resource leaks.

If an object is resurrected, there is the further question of whether its finalizer is called again, when it is next destroyed – unlike destructors, finalizers are potentially called multiple times. If finalizers are called for resurrected objects, objects may repeatedly resurrect themselves and be indestructible; this occurs in the CPython implementation of Python prior to Python 3.4, and in CLR languages such as C#. To avoid this, in many languages, including Java, Objective-C (at least in recent Apple implementations), and Python from Python 3.4, objects are finalized at most once, which requires tracking if the object has been finalized yet.

In other cases, notably CLR languages like C#, finalization is tracked separately from the objects themselves, and objects can be repeatedly registered or deregistered for finalization.

Problems

Depending on the implementation, finalizers can cause a significant number of problems, and are thus strongly discouraged by a number of authorities. [10] [11] These problems include: [10]

Further, finalizers may fail to run due to objects remaining reachable beyond when they are expected to be garbage, either due to programming errors or due to unexpected reachability. For example, when Python catches an exception (or an exception is not caught in interactive mode), it keeps a reference to the stack frame where the exception was raised, which keeps objects referenced from that stack frame alive.

In Java, finalizers in a superclass can also slow down garbage collection in a subclass, as the finalizer can potentially refer to fields in the subclass, and thus the field cannot be garbage collected until the following cycle, once the finalizer has run. [10] This can be avoided by using composition over inheritance.

Resource management

A common anti-pattern is to use finalizers to release resources, by analogy with the resource acquisition is initialization (RAII) idiom of C++: acquire a resource in the initializer (constructor), and release it in the finalizer (destructor). This does not work, for a number of reasons. Most basically, finalizers may never be called, and even if called, may not be called in a timely manner – thus using finalizers to release resources will generally cause resource leaks. Further, finalizers are not called in a prescribed order, while resources often need to be released in a specific order, frequently the opposite order in which they were acquired. Also, as finalizers are called at the discretion of the garbage collector, they will often only be called under managed memory pressure (when there is little managed memory available), regardless of resource pressure – if scarce resources are held by garbage but there is plenty of managed memory available, garbage collection may not occur, thus not reclaiming these resources.

Thus instead of using finalizers for automatic resource management, in garbage-collected languages one instead must manually manage resources, generally by using the dispose pattern. In this case resources may still be acquired in the initializer, which is called explicitly on object instantiation, but are released in the dispose method. The dispose method may be called explicitly, or implicitly by language constructs such as C#'s using, Java's try-with-resources, or Python's with.

However, in certain cases both the dispose pattern and finalizers are used for releasing resources. This is mostly found in CLR languages such as C#, where finalization is used as a backup for disposal: when a resource is acquired, the acquiring object is queued for finalization so that the resource is released on object destruction, even if the resource is not released by manual disposal.

Deterministic and non-deterministic object lifetimes

In languages with deterministic object lifetimes, notably C++, resource management is frequently done by tying resource possession lifetime to object lifetime, acquiring resources during initialization and releasing them during finalization; this is known as resource acquisition is initialization (RAII). This ensures that resource possession is a class invariant, and that resources are released promptly when the object is destroyed.

However, in languages with non-deterministic object lifetimes – which include all major languages with garbage collection, such as C#, Java, and Python – this does not work, because finalization may not be timely or may not happen at all, and thus resources may not be released for a long time or even at all, causing resource leaks. In these languages resources are instead generally managed manually via the dispose pattern: resources may still be acquired during initialization, but are released by calling a dispose method. Nevertheless, using finalization for releasing resources in these languages is a common anti-pattern, and forgetting to call dispose will still cause a resource leak.

In some cases both techniques are combined, using an explicit dispose method, but also releasing any still-held resources during finalization as a backup. This is commonly found in C#, and is implemented by registering an object for finalization whenever a resource is acquired, and suppressing finalization whenever a resource is released.

Object resurrection

If user-specified finalizers are allowed, it is possible for finalization to cause object resurrection, as the finalizers can run arbitrary code, which may create references from live objects to objects being destroyed. For languages without garbage collection, this is a severe bug, and causes dangling references and memory safety violations; for languages with garbage collection, this is prevented by the garbage collector, most commonly by adding another step to garbage collection (after running all user-specified finalizers, check for resurrection), which complicates and slows down garbage collection.

Further, object resurrection means that an object may not be destroyed, and in pathological cases an object can always resurrect itself during finalization, making itself indestructible. To prevent this, some languages, like Java and Python (from Python 3.4) only finalize objects once, and do not finalize resurrected objects.[ citation needed ] Concretely this is done by tracking if an object has been finalized on an object-by-object basis. Objective-C also tracks finalization (at least in recent[ when? ] Apple versions[ clarification needed ]) for similar reasons, treating resurrection as a bug.

A different approach is used in the .NET Framework, notably C# and Visual Basic .NET, where finalization is tracked by a "queue", rather than by object. In this case, if a user-specified finalizer is provided, by default the object is only finalized once (it is queued for finalization on creation, and dequeued once it is finalized), but this can be changed via calling the GC module. Finalization can be prevented by calling GC.SuppressFinalize, which dequeues the object, or reactivated by calling GC.ReRegisterForFinalize, which enqueues the object. These are particularly used when using finalization for resource management as a supplement to the dispose pattern, or when implementing an object pool.

Contrast with initialization

Finalization is formally complementary to initialization – initialization occurs at the start of lifetime, finalization at the end – but differs significantly in practice. Both variables and objects are initialized, mostly to assign values, but in general only objects are finalized, and in general there is no need to clear values – the memory can simply be deallocated and reclaimed by the operating system.

Beyond assigning initial values, initialization is mostly used to acquire resources or to register an object with some service (like an event handler). These actions have symmetric release or unregister actions, and these can symmetrically be handled in a finalizer, which is done in RAII. However, in many languages, notably those with garbage collection, object lifetime is asymmetric: object creation happens deterministically at some explicit point in the code, but object destruction happens non-deterministically, in some unspecified environment, at the discretion of the garbage collector. This asymmetry means that finalization cannot be effectively used as the complement of initialization, because it does not happen in a timely manner, in a specified order, or in a specified environment. The symmetry is partially restored by also disposing of the object at an explicit point, but in this case disposal and destruction do not happen at the same point, and an object may be in a "disposed but still alive" state, which weakens the class invariants and complicates use.

Variables are generally initialized at the start of their lifetime, but not finalized at the end of their lifetime – though if a variable has an object as its value, the object may be finalized. In some cases variables are also finalized: GCC extensions allow finalization of variables.

Connection with finally

As reflected in the naming, "finalization" and the finally construct both fulfill similar purposes: performing some final action, generally cleaning up, after something else has finished. They differ in when they occur – a finally clause is executed when program execution leaves the body of the associated try clause – this occurs during stack unwind, and there is thus a stack of pending finally clauses, in order – while finalization occurs when an object is destroyed, which happens depending on the memory management method, and in general there is simply a set of objects awaiting finalization – often on the heap – which need not happen in any specific order.

However, in some cases these coincide. In C++, object destruction is deterministic, and the behavior of a finally clause can be produced by having a local variable with an object as its value, whose scope is a block corresponds to the body of a try clause – the object is finalized (destructed) when execution exits this scope, exactly as if there were a finally clause. For this reason, C++ does not have a finally construct – the difference being that finalization is defined in the class definition as the destructor method, rather than at the call site in a finally clause.

Conversely, in the case of a finally clause in a coroutine, like in a Python generator, the coroutine may never terminate – only ever yielding – and thus in ordinary execution the finally clause is never executed. If one interprets instances of a coroutine as objects, then the finally clause can be considered a finalizer of the object, and thus can be executed when the instance is garbage collected. In Python terminology, the definition of a coroutine is a generator function, while an instance of it is a generator iterator, and thus a finally clause in a generator function becomes a finalizer in generator iterators instantiated from this function.

History

The notion of finalization as a separate step in object destruction dates to Montgomery (1994), [13] by analogy with the earlier distinction of initialization in object construction in Martin & Odell (1992). [14] Literature prior to this point used "destruction" for this process, not distinguishing finalization and deallocation, and programming languages dating to this period, like C++ and Perl, use the term "destruction". The terms "finalize" and "finalization" are also used in the influential book Design Patterns (1994). [lower-alpha 1] [15] The introduction of Java in 1995 contained finalize methods, which popularized the term and associated it with garbage collection, and languages from this point generally make this distinction and use the term "finalization", particularly in the context of garbage collection.

Notes

  1. Published 1994, with a 1995 copyright.

Related Research Articles

<span class="mw-page-title-main">Garbage collection (computer science)</span> Form of automatic memory management

In computer science, garbage collection (GC) is a form of automatic memory management. The garbage collector attempts to reclaim memory which was allocated by the program, but is no longer referenced; such memory is called garbage. Garbage collection was invented by American computer scientist John McCarthy around 1959 to simplify manual memory management in Lisp.

In computer science, reference counting is a programming technique of storing the number of references, pointers, or handles to a resource, such as an object, a block of memory, disk space, and others.

Java and C++ are two prominent object-oriented programming languages. By many language popularity metrics, the two languages have dominated object-oriented and high-performance software development for much of the 21st century, and are often directly compared and contrasted. Java's syntax was based on C/C++.

Cocoa is Apple's native object-oriented application programming interface (API) for its desktop operating system macOS.

In computer programming, a weak reference is a reference that does not protect the referenced object from collection by a garbage collector, unlike a strong reference. An object referenced only by weak references – meaning "every chain of references that reaches the object includes at least one weak reference as a link" – is considered weakly reachable, and can be treated as unreachable and so may be collected at any time. Some garbage-collected languages feature or support various levels of weak references, such as C#, Java, Lisp, OCaml, Perl, Python and PHP since the version 7.4.

A method in object-oriented programming (OOP) is a procedure associated with an object, and generally also a message. An object consists of state data and behavior; these compose an interface, which specifies how the object may be used. A method is a behavior of an object parametrized by a user.

In object-oriented programming (OOP), the object lifetime of an object is the time between an object's creation and its destruction. Rules for object lifetime vary significantly between languages, in some cases between implementations of a given language, and lifetime of a particular object may vary from one run of the program to another.

In object-oriented programming such as is often used in C++ and Object Pascal, a virtual function or virtual method is an inheritable and overridable function or method that is dispatched dynamically. Virtual functions are an important part of (runtime) polymorphism in object-oriented programming (OOP). They allow for the execution of target functions that were not precisely identified at compile time.

In computer programming, tracing garbage collection is a form of automatic memory management that consists of determining which objects should be deallocated by tracing which objects are reachable by a chain of references from certain "root" objects, and considering the rest as "garbage" and collecting them. Tracing garbage collection is the most common type of garbage collection – so much so that "garbage collection" often refers to tracing garbage collection, rather than other methods such as reference counting – and there are a large number of algorithms used in implementation.

Resource acquisition is initialization (RAII) is a programming idiom used in several object-oriented, statically typed programming languages to describe a particular language behavior. In RAII, holding a resource is a class invariant, and is tied to object lifetime. Resource allocation is done during object creation, by the constructor, while resource deallocation (release) is done during object destruction, by the destructor. In other words, resource acquisition must succeed for initialization to succeed. Thus the resource is guaranteed to be held between when initialization finishes and finalization starts, and to be held only when the object is alive. Thus if there are no object leaks, there are no resource leaks.

<span class="mw-page-title-main">Dangling pointer</span> Pointer that does not point to a valid object

Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations. More generally, dangling references and wild references are references that do not resolve to a valid destination.

C++/CLI is a variant of the C++ programming language, modified for Common Language Infrastructure. It has been part of Visual Studio 2005 and later, and provides interoperability with other .NET languages such as C#. Microsoft created C++/CLI to supersede Managed Extensions for C++. In December 2005, Ecma International published C++/CLI specifications as the ECMA-372 standard.

In object-oriented programming, a destructor is a method which is invoked mechanically just before the memory of the object is released. It can happen when its lifetime is bound to scope and the execution leaves the scope, when it is embedded in another object whose lifetime ends, or when it was allocated dynamically and is released explicitly. Its main purpose is to free the resources which were acquired by the object during its life and/or deregister from other entities which may keep references to it. Use of destructors is needed for the process of Resource Acquisition Is Initialization (RAII).

The object pool pattern is a software creational design pattern that uses a set of initialized objects kept ready to use – a "pool" – rather than allocating and destroying them on demand. A client of the pool will request an object from the pool and perform operations on the returned object. When the client has finished, it returns the object to the pool rather than destroying it; this can be done manually or automatically.

In computer science, garbage includes data, objects, or other regions of the memory of a computer system, which will not be used in any future computation by the system, or by a program running on it. Because every computer system has a finite amount of memory, and most software produces garbage, it is frequently necessary to deallocate memory that is occupied by garbage and return it to the heap, or memory pool, for reuse.

In computer science, manual memory management refers to the usage of manual instructions by the programmer to identify and deallocate unused objects, or garbage. Up until the mid-1990s, the majority of programming languages used in industry supported manual memory management, though garbage collection has existed since 1959, when it was introduced with Lisp. Today, however, languages with garbage collection such as Java are increasingly popular and the languages Objective-C and Swift provide similar functionality through Automatic Reference Counting. The main manually managed languages still in widespread use today are C and C++ – see C dynamic memory allocation.

In object-oriented programming, the dispose pattern is a design pattern for resource management. In this pattern, a resource is held by an object, and released by calling a conventional method – usually called close, dispose, free, release depending on the language – which releases any resources the object is holding onto. Many programming languages offer language constructs to avoid having to call the dispose method explicitly in common situations.

This comparison of programming languages compares how object-oriented programming languages such as C++, Java, Smalltalk, Object Pascal, Perl, Python, and others manipulate data structures.

In object-oriented programming languages with garbage collection, object resurrection is when an object comes back to life during the process of object destruction, as a side effect of a finalizer being executed.

In computer programming, resource management refers to techniques for managing resources.

References

  1. Jagger, Perry & Sestoft 2007 , p.  542 , "In C++, a destructor is called in a determinate manner, whereas, in C# , a finalizer is not. To get determinate behavior from C#, one should use Dispose.
  2. Boehm, Hans-J. (2002). Destructors, Finalizers, and Synchronization. Symposium on Principles of Programming Languages (POPL).
  3. Jagger, Perry & Sestoft 2007 , p.  542 , C++ destructors versus C# finalizers C++ destructors are determinate in the sense that they are run at known points in time, in a known order, and from a known thread. They are thus semantically very different from C# finalizers, which are run at unknown points in time, in an unknown order, from an unknown thread, and at the discretion of the garbage collector.
  4. In full: "We're going to use the term "destructor" for the member which executes when an instance is reclaimed. Classes can have destructors; structs can't. Unlike in C++, a destructor cannot be called explicitly. Destruction is non-deterministic – you can't reliably know when the destructor will execute, except to say that it executes at some point after all references to the object have been released. The destructors in an inheritance chain are called in order, from most descendant to least descendant. There is no need (and no way) for the derived class to explicitly call the base destructor. The C# compiler compiles destructors to the appropriate CLR representation. For this version that probably means an instance finalizer that is distinguished in metadata. CLR may provide static finalizers in the future; we do not see any barrier to C# using static finalizers.", May 12th, 1999.
  5. What’s the difference between a destructor and a finalizer?, Eric Lippert, Eric Lippert’s Blog: Fabulous Adventures In Coding, 21 Jan 2010
  6. 1 2 Jagger, Perry & Sestoft 2007 , p.  542 , "In the previous version of this standard, what is now referred to as a “finalizer” was called a “destructor”. Experience has shown that the term “destructor” caused confusion and often resulted in incorrect expectations, especially to programmers knowing C++. In C++, a destructor is called in a determinate manner, whereas, in C#, a finalizer is not. To get determinate behavior from C#, one should use Dispose."
  7. Class destructors Class destructors in D
  8. java.lang, Class Object: finalize
  9. "Runtime package - runtime - PKG.go.dev".
  10. 1 2 3 "MET12-J. Do not use finalizers", Dhruv Mohindra, The CERT Oracle Secure Coding Standard for Java, 05. Methods (MET) Archived 2014-05-04 at the Wayback Machine
  11. object.__del__(self), The Python Language Reference, 3. Data model: "... __del__() methods should do the absolute minimum needed to maintain external invariants."
  12. Hans-J. Boehm, Finalization, Threads, and the Java™ Technology-Based Memory Model, JavaOne Conference, 2005.
  13. Montgomery 1994, p.  120, "As with object instantiation, design for object termination can benefit from implementation of two operations for each class—a finalize and a terminate operation. A finalize operation breaks associations with other objects, ensuring data structure integrity."
  14. Montgomery 1994, p.  119, "Consider implementing class instantiation as a create and initialize operation, as suggested by Martin and Odell. The first allocates storage for new objects, and the second constructs the object to adhere to specifications and constraints."
  15. "Every new class has a fixed implementation overhead (initialization, finalization, etc.).", "destructor In C++, an operation that is automatically invoked to finalize an object that is about to be deleted."

Further reading