Double-checked locking

Last updated

In software engineering, double-checked locking (also known as "double-checked locking optimization" [1] ) is a software design pattern used to reduce the overhead of acquiring a lock by testing the locking criterion (the "lock hint") before acquiring the lock. Locking occurs only if the locking criterion check indicates that locking is required.

Contents

The original form of the pattern, appearing in Pattern Languages of Program Design 3, [2] has data races, depending on the memory model in use, and it is hard to get right. Some consider it to be an anti-pattern. [3] There are valid forms of the pattern, including the use of the volatile keyword in Java and explicit memory barriers in C++. [4]

The pattern is typically used to reduce locking overhead when implementing "lazy initialization" in a multi-threaded environment, especially as part of the Singleton pattern. Lazy initialization avoids initializing a value until the first time it is accessed.

Motivation and original pattern

Consider, for example, this code segment in the Java programming language: [4]

// Single-threaded versionclassFoo{privatestaticHelperhelper;publicHelpergetHelper(){if(helper==null){helper=newHelper();}returnhelper;}// other functions and members...}

The problem is that this does not work when using multiple threads. A lock must be obtained in case two threads call getHelper() simultaneously. Otherwise, either they may both try to create the object at the same time, or one may wind up getting a reference to an incompletely initialized object.

Synchronizing with a lock can fix this, as is shown in the following example:

// Correct but possibly expensive multithreaded versionclassFoo{privateHelperhelper;publicsynchronizedHelpergetHelper(){if(helper==null){helper=newHelper();}returnhelper;}// other functions and members...}

This is correct and will most likely have sufficient performance. However, the first call to getHelper() will create the object and only the few threads trying to access it during that time need to be synchronized; after that all calls just get a reference to the member variable. Since synchronizing a method could in some extreme cases decrease performance by a factor of 100 or higher, [5] the overhead of acquiring and releasing a lock every time this method is called seems unnecessary: once the initialization has been completed, acquiring and releasing the locks would appear unnecessary. Many programmers, including the authors of the double-checked locking design pattern, have attempted to optimize this situation in the following manner:

  1. Check that the variable is initialized (without obtaining the lock). If it is initialized, return it immediately.
  2. Obtain the lock.
  3. Double-check whether the variable has already been initialized: if another thread acquired the lock first, it may have already done the initialization. If so, return the initialized variable.
  4. Otherwise, initialize and return the variable.
// Broken multithreaded version// original "Double-Checked Locking" idiomclassFoo{privateHelperhelper;publicHelpergetHelper(){if(helper==null){synchronized(this){if(helper==null){helper=newHelper();}}}returnhelper;}// other functions and members...}

Intuitively, this algorithm is an efficient solution to the problem. But if the pattern is not written carefully, it will have a data race. For example, consider the following sequence of events:

  1. Thread A notices that the value is not initialized, so it obtains the lock and begins to initialize the value.
  2. Due to the semantics of some programming languages, the code generated by the compiler is allowed to update the shared variable to point to a partially constructed object before A has finished performing the initialization. For example, in Java if a call to a constructor has been inlined then the shared variable may immediately be updated once the storage has been allocated but before the inlined constructor initializes the object. [6]
  3. Thread B notices that the shared variable has been initialized (or so it appears), and returns its value. Because thread B believes the value is already initialized, it does not acquire the lock. If B uses the object before all of the initialization done by A is seen by B (either because A has not finished initializing it or because some of the initialized values in the object have not yet percolated to the memory B uses (cache coherence)), the program will likely crash.

Most runtimes have memory barriers or other methods for managing memory visibility across execution units. Without a detailed understanding of the language's behavior in this area, the algorithm is difficult to implement correctly. One of the dangers of using double-checked locking is that even a naive implementation will appear to work most of the time: it is not easy to distinguish between a correct implementation of the technique and one that has subtle problems. Depending on the compiler, the interleaving of threads by the scheduler and the nature of other concurrent system activity, failures resulting from an incorrect implementation of double-checked locking may only occur intermittently. Reproducing the failures can be difficult.

Usage in C++11

For the singleton pattern, double-checked locking is not needed:

If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization.

§ 6.7 [stmt.dcl] p4
Singleton&GetInstance(){staticSingletons;returns;}

C++11 and beyond also provide a built-in double-checked locking pattern in the form of std::once_flag and std::call_once:

#include<mutex>#include<optional> // Since C++17// Singleton.hclassSingleton{public:staticSingleton*GetInstance();private:Singleton()=default;staticstd::optional<Singleton>s_instance;staticstd::once_flags_flag;};// Singleton.cppstd::optional<Singleton>Singleton::s_instance;std::once_flagSingleton::s_flag{};Singleton*Singleton::GetInstance(){std::call_once(Singleton::s_flag,[](){s_instance.emplace(Singleton{});});return&*s_instance;}

If one truly wishes to use the double-checked idiom instead of the trivially working example above (for instance because Visual Studio before the 2015 release did not implement the C++11 standard's language about concurrent initialization quoted above [7] ), one needs to use acquire and release fences: [8]

#include<atomic>#include<mutex>classSingleton{public:staticSingleton*GetInstance();private:Singleton()=default;staticstd::atomic<Singleton*>s_instance;staticstd::mutexs_mutex;};Singleton*Singleton::GetInstance(){Singleton*p=s_instance.load(std::memory_order_acquire);if(p==nullptr){// 1st checkstd::lock_guard<std::mutex>lock(s_mutex);p=s_instance.load(std::memory_order_relaxed);if(p==nullptr){// 2nd (double) checkp=newSingleton();s_instance.store(p,std::memory_order_release);}}returnp;}

Usage in POSIX

pthread_once() must be used to initialize library (or sub-module) code when its API does not have a dedicated initialization procedure required to be called in single-threaded mode.

Usage in Go

packagemainimport"sync"vararrOncesync.Oncevararr[]int// getArr retrieves arr, lazily initializing on first call. Double-checked// locking is implemented with the sync.Once library function. The first// goroutine to win the race to call Do() will initialize the array, while// others will block until Do() has completed. After Do has run, only a// single atomic comparison will be required to get the array.funcgetArr()[]int{arrOnce.Do(func(){arr=[]int{0,1,2}})returnarr}funcmain(){// thanks to double-checked locking, two goroutines attempting to getArr()// will not cause double-initializationgogetArr()gogetArr()}

Usage in Java

As of J2SE 5.0, the volatile keyword is defined to create a memory barrier. This allows a solution that ensures that multiple threads handle the singleton instance correctly. This new idiom is described in and .

// Works with acquire/release semantics for volatile in Java 1.5 and later// Broken under Java 1.4 and earlier semantics for volatileclassFoo{privatevolatileHelperhelper;publicHelpergetHelper(){HelperlocalRef=helper;if(localRef==null){synchronized(this){localRef=helper;if(localRef==null){helper=localRef=newHelper();}}}returnlocalRef;}// other functions and members...}

Note the local variable "localRef", which seems unnecessary. The effect of this is that in cases where helper is already initialized (i.e., most of the time), the volatile field is only accessed once (due to "return localRef;" instead of "return helper;"), which can improve the method's overall performance by as much as 40 percent. [9]

Java 9 introduced the VarHandle class, which allows use of relaxed atomics to access fields, giving somewhat faster reads on machines with weak memory models, at the cost of more difficult mechanics and loss of sequential consistency (field accesses no longer participate in the synchronization order, the global order of accesses to volatile fields). [10]

// Works with acquire/release semantics for VarHandles introduced in Java 9classFoo{privatevolatileHelperhelper;publicHelpergetHelper(){HelperlocalRef=getHelperAcquire();if(localRef==null){synchronized(this){localRef=getHelperAcquire();if(localRef==null){localRef=newHelper();setHelperRelease(localRef);}}}returnlocalRef;}privatestaticfinalVarHandleHELPER;privateHelpergetHelperAcquire(){return(Helper)HELPER.getAcquire(this);}privatevoidsetHelperRelease(Helpervalue){HELPER.setRelease(this,value);}static{try{MethodHandles.Lookuplookup=MethodHandles.lookup();HELPER=lookup.findVarHandle(Foo.class,"helper",Helper.class);}catch(ReflectiveOperationExceptione){thrownewExceptionInInitializerError(e);}}// other functions and members...}

If the helper object is static (one per class loader), an alternative is the initialization-on-demand holder idiom [11] (See Listing 16.6 [12] from the previously cited text.)

// Correct lazy initialization in JavaclassFoo{privatestaticclassHelperHolder{publicstaticfinalHelperhelper=newHelper();}publicstaticHelpergetHelper(){returnHelperHolder.helper;}}

This relies on the fact that nested classes are not loaded until they are referenced.

Semantics of final field in Java 5 can be employed to safely publish the helper object without using volatile: [13]

publicclassFinalWrapper<T>{publicfinalTvalue;publicFinalWrapper(Tvalue){this.value=value;}}publicclassFoo{privateFinalWrapper<Helper>helperWrapper;publicHelpergetHelper(){FinalWrapper<Helper>tempWrapper=helperWrapper;if(tempWrapper==null){synchronized(this){if(helperWrapper==null){helperWrapper=newFinalWrapper<Helper>(newHelper());}tempWrapper=helperWrapper;}}returntempWrapper.value;}}

The local variable tempWrapper is required for correctness: simply using helperWrapper for both null checks and the return statement could fail due to read reordering allowed under the Java Memory Model. [14] Performance of this implementation is not necessarily better than the volatile implementation.

Usage in C#

In .NET Framework 4.0, the Lazy<T> class was introduced, which internally uses double-checked locking by default (ExecutionAndPublication mode) to store either the exception that was thrown during construction, or the result of the function that was passed to Lazy<T>: [15]

publicclassMySingleton{privatestaticreadonlyLazy<MySingleton>_mySingleton=newLazy<MySingleton>(()=>newMySingleton());privateMySingleton(){}publicstaticMySingletonInstance=>_mySingleton.Value;}

See also

Related Research Articles

Thread safety is a computer programming concept applicable to multi-threaded code. Thread-safe code only manipulates shared data structures in a manner that ensures that all threads behave properly and fulfill their design specifications without unintended interaction. There are various strategies for making thread-safe data structures.

<span class="mw-page-title-main">Singleton pattern</span> Design pattern in object-oriented software development

In software engineering, the singleton pattern is a software design pattern that restricts the instantiation of a class to a singular instance. One of the well-known "Gang of Four" design patterns, which describe how to solve recurring problems in object-oriented software, the pattern is useful when exactly one object is needed to coordinate actions across a system.

In computer programming, lazy initialization is the tactic of delaying the creation of an object, the calculation of a value, or some other expensive process until the first time it is needed. It is a kind of lazy evaluation that refers specifically to the instantiation of objects or other resources.

In object-oriented and functional programming, an immutable object is an object whose state cannot be modified after it is created. This is in contrast to a mutable object, which can be modified after it is created. In some cases, an object is considered immutable even if some internally used attributes change, but the object's state appears unchanging from an external point of view. For example, an object that uses memoization to cache the results of expensive computations could still be considered an immutable object.

This article compares two programming languages: C# with Java. While the focus of this article is mainly the languages and their features, such a comparison will necessarily also consider some features of platforms and libraries. For a more detailed comparison of the platforms, see Comparison of the Java and .NET platforms.

Resource acquisition is initialization (RAII) is a programming idiom used in several object-oriented, statically-typed programming languages to describe a particular language behavior. In RAII, holding a resource is a class invariant, and is tied to object lifetime. Resource allocation is done during object creation, by the constructor, while resource deallocation (release) is done during object destruction, by the destructor. In other words, resource acquisition must succeed for initialization to succeed. Thus the resource is guaranteed to be held between when initialization finishes and finalization starts, and to be held only when the object is alive. Thus if there are no object leaks, there are no resource leaks.

In computer programming, volatile means that a value is prone to change over time, outside the control of some code. Volatility has implications within function calling conventions, and also impacts how variables are stored, accessed and cached.

In the C++ programming language, a reference is a simple reference datatype that is less powerful but safer than the pointer type inherited from C. The name C++ reference may cause confusion, as in computer science a reference is a general concept datatype, with pointers and C++ references being specific reference datatype implementations. The definition of a reference in C++ is such that it does not need to exist. It can be implemented as a new name for an existing object.

<span class="mw-page-title-main">Java syntax</span> Set of rules defining correctly structured program

The syntax of Java is the set of rules defining how a Java program is written and interpreted.

In computer programming, thread-local storage (TLS) is a memory management method that uses static or global memory local to a thread.

<span class="mw-page-title-main">Multiton pattern</span> Software engineering design pattern

In software engineering, the multiton pattern is a design pattern which generalizes the singleton pattern. Whereas the singleton allows only one instance of a class to be created, the multiton pattern allows for the controlled creation of multiple instances, which it manages through the use of a map.

In computer science, a readers–writer is a synchronization primitive that solves one of the readers–writers problems. An RW lock allows concurrent access for read-only operations, whereas write operations require exclusive access. This means that multiple threads can read the data in parallel but an exclusive lock is needed for writing or modifying data. When a writer is writing the data, all other writers and readers will be blocked until the writer is finished writing. A common use might be to control access to a data structure in memory that cannot be updated atomically and is invalid until the update is complete.

C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by the publication year of the specification, though it was formerly named C++0x because it was expected to be published before 2010.

In software engineering, the initialization-on-demand holder idiom is a lazy-loaded singleton. In all versions of Java, the idiom enables a safe, highly concurrent lazy initialization of static fields with good performance.

The Java programming language and the Java virtual machine (JVM) is designed to support concurrent programming. All execution takes place in the context of threads. Objects and resources can be accessed by many separate threads. Each thread has its own path of execution, but can potentially access any object in the program. The programmer must ensure read and write access to objects is properly coordinated between threads. Thread synchronization ensures that objects are modified by only one thread at a time and prevents threads from accessing partially updated objects during modification by another thread. The Java language has built-in constructs to support this coordination.

In computing, the producer-consumer problem is a family of problems described by Edsger W. Dijkstra since 1965.

This article describes the syntax of the C# programming language. The features described are compatible with .NET Framework and Mono.

Wrapper libraries consist of a thin layer of code which translates a library's existing interface into a compatible interface. This is done for several reasons:

The Java programming language's Java Collections Framework version 1.5 and later defines and implements the original regular single-threaded Maps, and also new thread-safe Maps implementing the java.util.concurrent.ConcurrentMap interface among other concurrent interfaces. In Java 1.6, the java.util.NavigableMap interface was added, extending java.util.SortedMap, and the java.util.concurrent.ConcurrentNavigableMap interface was added as a subinterface combination.

References

  1. Schmidt, D et al. Pattern-Oriented Software Architecture Vol 2, 2000 pp353-363
  2. Pattern languages of program design. 3 (PDF) (Nachdr. ed.). Reading, Mass: Addison-Wesley. 1998. ISBN   978-0201310115.
  3. Gregoire, Marc (24 February 2021). Professional C++. John Wiley & Sons. ISBN   978-1-119-69545-5.
  4. 1 2 David Bacon et al. The "Double-Checked Locking is Broken" Declaration.
  5. Boehm, Hans-J (Jun 2005). "Threads cannot be implemented as a library" (PDF). ACM SIGPLAN Notices. 40 (6): 261–268. doi:10.1145/1064978.1065042. Archived from the original (PDF) on 2017-05-30. Retrieved 2014-08-12.
  6. Haggar, Peter (1 May 2002). "Double-checked locking and the Singleton pattern". IBM. Archived from the original on 2017-10-27. Retrieved 2022-05-19.
  7. "Support for C++11-14-17 Features (Modern C++)".
  8. Double-Checked Locking is Fixed In C++11
  9. Bloch, Joshua (2018). Effective Java (Third ed.). Addison-Wesley. p. 335. ISBN   978-0-13-468599-1. On my machine, the method above is about 1.4 times as fast as the obvious version without a local variable.
  10. "Chapter 17. Threads and Locks". docs.oracle.com. Retrieved 2018-07-28.
  11. Brian Goetz et al. Java Concurrency in Practice, 2006 pp348
  12. Goetz, Brian; et al. "Java Concurrency in Practice – listings on website" . Retrieved 21 October 2014.
  13. Javamemorymodel-discussion mailing list
  14. Manson, Jeremy (2008-12-14). "Date-Race-Ful Lazy Initialization for Performance – Java Concurrency (&c)" . Retrieved 3 December 2016.
  15. Albahari, Joseph (2010). "Threading in C#: Using Threads". C# 4.0 in a Nutshell. O'Reilly Media. ISBN   978-0-596-80095-6. Lazy<T> actually implements […] double-checked locking. Double-checked locking performs an additional volatile read to avoid the cost of obtaining a lock if the object is already initialized.