Smart pointer

Last updated

In computer science, a smart pointer is an abstract data type that simulates a pointer while providing added features, such as automatic memory management or bounds checking. Such features are intended to reduce bugs caused by the misuse of pointers, while retaining efficiency. Smart pointers typically keep track of the memory they point to, and may also be used to manage other resources, such as network connections and file handles. Smart pointers were first popularized in the programming language C++ during the first half of the 1990s as rebuttal to criticisms of C++'s lack of automatic garbage collection. [1] [2]

Contents

Pointer misuse can be a major source of bugs. Smart pointers prevent most situations of memory leaks by making the memory deallocation automatic. More generally, they make object destruction automatic: an object controlled by a smart pointer is automatically destroyed (finalized and then deallocated) when the last (or only) owner of an object is destroyed, for example because the owner is a local variable, and execution leaves the variable's scope. Smart pointers also eliminate dangling pointers by postponing destruction until an object is no longer in use.

If a language supports automatic garbage collection (for example, Java or C#), then smart pointers are unneeded for reclaiming and safety aspects of memory management, yet are useful for other purposes, such as cache data structure residence management and resource management of objects such as file handles or network sockets.

Several types of smart pointers exist. Some work with reference counting, others by assigning ownership of an object to one pointer.

History

Even though C++ popularized the concept of smart pointers, especially the reference-counted variety, [3] the immediate predecessor of one of the languages that inspired C++'s design had reference-counted references built into the language. C++ was inspired in part by Simula67. [4] Simula67's ancestor was Simula I. Insofar as Simula I's element is analogous to C++'s pointer without null, and insofar as Simula I's process with a dummy-statement as its activity body is analogous to C++'s struct (which itself is analogous to C. A. R. Hoare's record in then-contemporary 1960s work), Simula I had reference counted elements (i.e., pointer-expressions that house indirection) to processes (i.e., records) no later than September 1965, as shown in the quoted paragraphs below. [5]

Processes can be referenced individually. Physically, a process reference is a pointer to an area of memory containing the data local to the process and some additional information defining its current state of execution. However, for reasons stated in the Section 2.2 process references are always indirect, through items called elements. Formally a reference to a process is the value of an expression of type element.

element values can be stored and retrieved by assignments and references to element variables and by other means.
The language contains a mechanism for making the attributes of a process accessible from the outside, i.e., from within other processes. This is called remote access- ing. A process is thus a referenceable data structure.

It is worth noticing the similarity between a process whose activity body is a dummy statement, and the record concept recently proposed by C. A. R. Hoare and N. Wirth

Because C++ borrowed Simula's approach to memory allocationthe new keyword when allocating a process/record to obtain a fresh element to that process/recordit is not surprising that C++ eventually resurrected Simula's reference-counted smart-pointer mechanism within element as well.

Features

In C++, a smart pointer is implemented as a template class that mimics, by means of operator overloading, the behaviors of a traditional (raw) pointer, (e.g. dereferencing, assignment) while providing additional memory management features.

Smart pointers can facilitate intentional programming by expressing, in the type, how the memory of the referent of the pointer will be managed. For example, if a C++ function returns a pointer, there is no way to know whether the caller should delete the memory of the referent when the caller is finished with the information.

SomeType*AmbiguousFunction();// What should be done with the result?

Traditionally, naming conventions have been used to resolve the ambiguity, [6] which is an error-prone, labor-intensive approach. C++11 introduced a way to ensure correct memory management in this case by declaring the function to return a unique_ptr,

std::unique_ptr<SomeType>ObviousFunction();

The declaration of the function return type as a unique_ptr makes explicit the fact that the caller takes ownership of the result, and the C++ runtime ensures that the memory will be reclaimed automatically. Before C++11, unique_ptr can be replaced with auto_ptr, which is now deprecated.

Creating new objects

To ease the allocation of a

std::shared_ptr<SomeType>

C++11 introduced:

autos=std::make_shared<SomeType>(constructor,parameters,here);

and similarly

std::unique_ptr<some_type>

Since C++14 one can use:

autou=std::make_unique<SomeType>(constructor,parameters,here);

It is preferred, in almost all circumstances, to use these facilities over the new keyword. [7]

unique_ptr

C++11 introduces std::unique_ptr, defined in the header <memory>. [8]

A unique_ptr is a container for a raw pointer, which the unique_ptr is said to own. A unique_ptr explicitly prevents copying of its contained pointer (as would happen with normal assignment), but the std::move function can be used to transfer ownership of the contained pointer to another unique_ptr. A unique_ptr cannot be copied because its copy constructor and assignment operators are explicitly deleted.

std::unique_ptr<int>p1(newint(5));std::unique_ptr<int>p2=p1;// Compile error.std::unique_ptr<int>p3=std::move(p1);// Transfers ownership. p3 now owns the memory and p1 is set to nullptr.p3.reset();// Deletes the memory.p1.reset();// Does nothing.

std::auto_ptr is deprecated under C++11 and completely removed from C++17. The copy constructor and assignment operators of auto_ptr do not actually copy the stored pointer. Instead, they transfer it, leaving the prior auto_ptr object empty. This was one way to implement strict ownership, so that only one auto_ptr object can own the pointer at any given time. This means that auto_ptr should not be used where copy semantics are needed. [9] [ citation needed ] Since auto_ptr already existed with its copy semantics, it could not be upgraded to be a move-only pointer without breaking backward compatibility with existing code.

shared_ptr and weak_ptr

C++11 introduces std::shared_ptr and std::weak_ptr, defined in the header <memory>. [8] C++11 also introduces std::make_shared (std::make_unique was introduced in C++14) to safely allocate dynamic memory in the RAII paradigm. [10]

A shared_ptr is a container for a raw pointer. It maintains reference counting ownership of its contained pointer in cooperation with all copies of the shared_ptr. An object referenced by the contained raw pointer will be destroyed when and only when all copies of the shared_ptr have been destroyed.

std::shared_ptr<int>p0(newint(5));// Valid, allocates 1 integer and initialize it with value 5.std::shared_ptr<int[]>p1(newint[5]);// Valid, allocates 5 integers.std::shared_ptr<int[]>p2=p1;// Both now own the memory.p1.reset();// Memory still exists, due to p2.p2.reset();// Frees the memory, since no one else owns the memory.

A weak_ptr is a container for a raw pointer. It is created as a copy of a shared_ptr. The existence or destruction of weak_ptr copies of a shared_ptr have no effect on the shared_ptr or its other copies. After all copies of a shared_ptr have been destroyed, all weak_ptr copies become empty.

std::shared_ptr<int>p1=std::make_shared<int>(5);std::weak_ptr<int>wp1{p1};// p1 owns the memory.{std::shared_ptr<int>p2=wp1.lock();// Now p1 and p2 own the memory.// p2 is initialized from a weak pointer, so you have to check if the// memory still exists!if(p2){DoSomethingWith(p2);}}// p2 is destroyed. Memory is owned by p1.p1.reset();// Free the memory.std::shared_ptr<int>p3=wp1.lock();// Memory is gone, so we get an empty shared_ptr.if(p3){// code will not executeActionThatNeedsALivePointer(p3);}

Because the implementation of shared_ptr uses reference counting, circular references are potentially a problem. A circular shared_ptr chain can be broken by changing the code so that one of the references is a weak_ptr.

Multiple threads can safely simultaneously access different shared_ptr and weak_ptr objects that point to the same object. [11]

The referenced object must be protected separately to ensure thread safety.

shared_ptr and weak_ptr are based on versions used by the Boost libraries.[ citation needed ] C++ Technical Report 1 (TR1) first introduced them to the standard, as general utilities, but C++11 adds more functions, in line with the Boost version.

Other types of smart pointers

There are other types of smart pointers (which are not in the C++ standard) implemented on popular C++ libraries or custom STL, some examples include hazard pointer [12] and intrusive pointer. [13] [14]

See also

Related Research Articles

In computer science, reference counting is a programming technique of storing the number of references, pointers, or handles to a resource, such as an object, a block of memory, disk space, and others.

In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory. On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process, and sometimes a core dump.

In software engineering, the composite pattern is a partitioning design pattern. The composite pattern describes a group of objects that are treated the same way as a single instance of the same type of object. The intent of a composite is to "compose" objects into tree structures to represent part-whole hierarchies. Implementing the composite pattern lets clients treat individual objects and compositions uniformly.

In computer programming, a weak reference is a reference that does not protect the referenced object from collection by a garbage collector, unlike a strong reference. An object referenced only by weak references – meaning "every chain of references that reaches the object includes at least one weak reference as a link" – is considered weakly reachable, and can be treated as unreachable and so may be collected at any time. Some garbage-collected languages feature or support various levels of weak references, such as C#, Java, Lisp, OCaml, Perl, Python and PHP since the version 7.4.

<span class="mw-page-title-main">D (programming language)</span> Multi-paradigm system programming language

D, also known as dlang, is a multi-paradigm system programming language created by Walter Bright at Digital Mars and released in 2001. Andrei Alexandrescu joined the design and development effort in 2007. Though it originated as a re-engineering of C++, D is a profoundly different language — features of D can be considered streamlined and expanded-upon ideas from C++, however D also draws inspiration from other high-level programming languages, notably Java, Python, Ruby, C#, and Eiffel.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

In the C++ programming language, a copy constructor is a special constructor for creating a new object as a copy of an existing object. Copy constructors are the standard way of copying objects in C++, as opposed to cloning, and have C++-specific nuances.

In computer programming, run-time type information or run-time type identification (RTTI) is a feature of some programming languages that exposes information about an object's data type at runtime. Run-time type information may be available for all types or only to types that explicitly have it. Run-time type information is a specialization of a more general concept called type introspection.

<span class="mw-page-title-main">Foreach loop</span> Control flow statement for traversing items in a collection

In computer programming, foreach loop is a control flow statement for traversing items in a collection. foreach is usually used in place of a standard for loop statement. Unlike other for loop constructs, however, foreach loops usually maintain no explicit counter: they essentially say "do this to everything in this set", rather than "do this x times". This avoids potential off-by-one errors and makes code simpler to read. In object-oriented languages, an iterator, even if implicit, is often used as the means of traversal.

Resource acquisition is initialization (RAII) is a programming idiom used in several object-oriented, statically typed programming languages to describe a particular language behavior. In RAII, holding a resource is a class invariant, and is tied to object lifetime. Resource allocation is done during object creation, by the constructor, while resource deallocation (release) is done during object destruction, by the destructor. In other words, resource acquisition must succeed for initialization to succeed. Thus the resource is guaranteed to be held between when initialization finishes and finalization starts, and to be held only when the object is alive. Thus if there are no object leaks, there are no resource leaks.

In the C++ programming language, a reference is a simple reference datatype that is less powerful but safer than the pointer type inherited from C. The name C++ reference may cause confusion, as in computer science a reference is a general concept datatype, with pointers and C++ references being specific reference datatype implementations. The definition of a reference in C++ is such that it does not need to exist. It can be implemented as a new name for an existing object.

<span class="mw-page-title-main">Dangling pointer</span> Pointer that does not point to a valid object

Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations. More generally, dangling references and wild references are references that do not resolve to a valid destination.

In some programming languages, const is a type qualifier that indicates that the data is read-only. While this can be used to declare constants, const in the C family of languages differs from similar constructs in other languages in being part of the type, and thus has complicated behavior when combined with pointers, references, composite data types, and type-checking. In other languages, the data is not in a single memory location, but copied at compile time on each use. Languages which use it include C, C++, D, JavaScript, Julia, and Rust.

In computer programming, an opaque pointer is a special case of an opaque data type, a data type declared to be a pointer to a record or data structure of some unspecified type.

C++ Technical Report 1 (TR1) is the common name for ISO/IEC TR 19768, C++ Library Extensions, which is a document that proposed additions to the C++ standard library for the C++03 language standard. The additions include regular expressions, smart pointers, hash tables, and random number generators. TR1 was not a standard itself, but rather a draft document. However, most of its proposals became part of the later official standard, C++11. Before C++11 was standardized, vendors used this document as a guide to create extensions. The report's goal was "to build more widespread existing practice for an expanded C++ standard library".

In the C++ programming language, auto_ptr is an obsolete smart pointer class template that was available in previous versions of the C++ standard library, which provides some basic RAII features for C++ raw pointers. It has been replaced by the unique_ptr class.

C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by the publication year of the specification, though it was formerly named C++0x because it was expected to be published before 2010.

In multithreaded computing, the ABA problem occurs during synchronization, when a location is read twice, has the same value for both reads, and the read value being the same twice is used to conclude that nothing has happened in the interim; however, another thread can execute between the two reads and change the value, do other work, then change the value back, thus fooling the first thread into thinking nothing has changed even though the second thread did work that violates that assumption.

C++14 is a version of the ISO/IEC 14882 standard for the C++ programming language. It is intended to be a small extension over C++11, featuring mainly bug fixes and small improvements, and was replaced by C++17. Its approval was announced on August 18, 2014. C++14 was published as ISO/IEC 14882:2014 in December 2014.

References

  1. Kline, Marshall (September 1997). "C++ FAQs Lite's sections on reference-counted smart pointers and copy-on-write reference semantics in the freestore management FAQs". cis.usouthal.edu. Retrieved 2018-04-06.
  2. Colvin, Gregory (1994). "proposal to standardize counted_ptr in the C++ standard library" (PDF). open-std.org. Retrieved 2018-04-06.
  3. Klabnik, Steve; Nichols, Carol (2023) [2018]. "15. Smart Pointers". The Rust Programming Language (2 ed.). San Francisco, California, USA: No Starch Press, Inc. pp. 315–351. ISBN   978-1-7185-0310-6. (xxix+1+527+3 pages)
  4. Stroustrup, Bjarne. "A history of C++: 1979–1991" (PDF). Retrieved 2018-04-06.
  5. Dahl, Ole-Johan; Nygaard, Kristen (September 1966). "SIMULA—An ALGOL-based simulation language" (PDF). folk.uio.no. Retrieved 2018-04-06.
  6. "Taligent's Guide to Designing Programs, section Use special names for copy, create, and adopt routines".
  7. Sutter, Herb (2013-04-20). "Trip Report: ISO C++ Spring 2013 Meeting". isocpp.org. Retrieved 2013-06-14.
  8. 1 2 ISO 14882:2011 20.7.1
  9. CERT C++ Secure Coding Standard
  10. ISO 14882:2014 20.7.1
  11. "boost::shared_ptr thread safety". (NB. Does not formally cover std::shared_ptr, but is believed to have the same threading limitations.)
  12. "folly/Hazptr.h at main · facebook/folly". github.com.
  13. "Boost.SmartPtr: The Smart Pointer Library - 1.81.0". boost.org.
  14. "EASTL/intrusive_ptr.h at master · electronicarts/EASTL". github.com.

Further reading