Synchronization (computer science)

Last updated

In computer science, synchronization is the task of coordinating multiple of processes to join up or handshake at a certain point, in order to reach an agreement or commit to a certain sequence of action.

Contents

Motivation

The need for synchronization does not arise merely in multi-processor systems but for any kind of concurrent processes; even in single processor systems. Mentioned below are some of the main needs for synchronization:

Forks and Joins: When a job arrives at a fork point, it is split into N sub-jobs which are then serviced by n tasks. After being serviced, each sub-job waits until all other sub-jobs are done processing. Then, they are joined again and leave the system. Thus, parallel programming requires synchronization as all the parallel processes wait for several other processes to occur.

Producer-Consumer: In a producer-consumer relationship, the consumer process is dependent on the producer process until the necessary data has been produced.

Exclusive use resources: When multiple processes are dependent on a resource and they need to access it at the same time, the operating system needs to ensure that only one processor accesses it at a given point in time. This reduces concurrency.

Requirements

Figure 1: Three processes accessing a shared resource (critical section) simultaneously. Multiple Processes Accessing the shared resource.png
Figure 1: Three processes accessing a shared resource (critical section) simultaneously.

Thread synchronization is defined as a mechanism which ensures that two or more concurrent processes or threads do not simultaneously execute some particular program segment known as critical section. Processes' access to critical section is controlled by using synchronization techniques. When one thread starts executing the critical section (serialized segment of the program) the other thread should wait until the first thread finishes. If proper synchronization techniques [1] are not applied, it may cause a race condition where the values of variables may be unpredictable and vary depending on the timings of context switches of the processes or threads.

For example, suppose that there are three processes, namely 1, 2, and 3. All three of them are concurrently executing, and they need to share a common resource (critical section) as shown in Figure 1. Synchronization should be used here to avoid any conflicts for accessing this shared resource. Hence, when Process 1 and 2 both try to access that resource, it should be assigned to only one process at a time. If it is assigned to Process 1, the other process (Process 2) needs to wait until Process 1 frees that resource (as shown in Figure 2).

Figure 2: A process accessing a shared resource if available, based on some synchronization technique. Shared Resource access in synchronization environment.png
Figure 2: A process accessing a shared resource if available, based on some synchronization technique.

Another synchronization requirement which needs to be considered is the order in which particular processes or threads should be executed. For example, one cannot board a plane before buying a ticket. Similarly, one cannot check e-mails before validating the appropriate credentials (for example, user name and password). In the same way, an ATM will not provide any service until it receives a correct PIN.

Other than mutual exclusion, synchronization also deals with the following:

Minimization

One of the challenges for exascale algorithm design is to minimize or reduce synchronization. Synchronization takes more time than computation, especially in distributed computing. Reducing synchronization drew attention from computer scientists for decades. Whereas it becomes an increasingly significant problem recently as the gap between the improvement of computing and latency increases. Experiments have shown that (global) communications due to synchronization on distributed computers takes a dominated share in a sparse iterative solver. [2] This problem is receiving increasing attention after the emergence of a new benchmark metric, the High Performance Conjugate Gradient(HPCG), [3] for ranking the top 500 supercomputers.

Classic problems

The following are some classic problems of synchronization:

These problems are used to test nearly every newly proposed synchronization scheme or primitive.

Hardware synchronization

Many systems provide hardware support for critical section code.

A single processor or uniprocessor system could disable interrupts by executing currently running code without preemption, which is very inefficient on multiprocessor systems. [4] "The key ability we require to implement synchronization in a multiprocessor is a set of hardware primitives with the ability to atomically read and modify a memory location. Without such a capability, the cost of building basic synchronization primitives will be too high and will increase as the processor count increases. There are a number of alternative formulations of the basic hardware primitives, all of which provide the ability to atomically read and modify a location, together with some way to tell if the read and write were performed atomically. These hardware primitives are the basic building blocks that are used to build a wide variety of user-level synchronization operations, including things such as locks and barriers. In general, architects do not expect users to employ the basic hardware primitives, but instead expect that the primitives will be used by system programmers to build a synchronization library, a process that is often complex and tricky." [5] Many modern pieces of hardware provide such atomic instructions, two common examples being: test-and-set, which operates on a single memory word, and compare-and-swap, which swaps the contents of two memory words.

Support in programming languages

In Java, one way to prevent thread interference and memory consistency errors, is by prefixing a method signature with the synchronized keyword, in which case the lock of the declaring object is used to enforce synchronization. A second way is to wrap a block of code in a synchronized(someObject){...} section, which offers finer-grain control. This forces any thread to acquire the lock of someObject before it can execute the contained block. The lock is automatically released when the thread which acquired the lock leaves this block or enters a waiting state within the block. Any variable updates made by a thread in a synchronized block become visible to other threads when they similarly acquire the lock and execute the block. For either implementation, any object may be used to provide a lock because all Java objects have an intrinsic lock or monitor lock associated with them when instantiated. [6]

Java synchronized blocks, in addition to enabling mutual exclusion and memory consistency, enable signaling—i.e. sending events from threads which have acquired the lock and are executing the code block to those which are waiting for the lock within the block. Java synchronized sections, therefore, combine the functionality of both mutexes and events to ensure synchronization. Such a construct is known as a synchronization monitor.

The .NET Framework also uses synchronization primitives. [7] "Synchronization is designed to be cooperative, demanding that every thread follow the synchronization mechanism before accessing protected resources for consistent results. Locking, signaling, lightweight synchronization types, spinwait and interlocked operations are mechanisms related to synchronization in .NET." [8]

Many programming languages support synchronization and entire specialized languages have been written for embedded application development where strictly deterministic synchronization is paramount.

Implementation

Spinlock

Another effective way of implementing synchronization is by using spinlocks. Before accessing any shared resource or piece of code, every processor checks a flag. If the flag is reset, then the processor sets the flag and continues executing the thread. But, if the flag is set (locked), the threads would keep spinning in a loop and keep checking if the flag is set or not. But, spinlocks are effective only if the flag is reset for lower cycles otherwise it can lead to performance issues as it wastes many processor cycles waiting. [9]

Barriers

Barriers are simple to implement and provide good responsiveness. They are based on the concept of implementing wait cycles to provide synchronization. Consider three threads running simultaneously, starting from barrier 1. After time t, thread1 reaches barrier 2 but it still has to wait for thread 2 and 3 to reach barrier2 as it does not have the correct data. Once all the threads reach barrier 2 they all start again. After time t, thread 1 reaches barrier3 but it will have to wait for threads 2 and 3 and the correct data again.

Thus, in barrier synchronization of multiple threads there will always be a few threads that will end up waiting for other threads as in the above example thread 1 keeps waiting for thread 2 and 3. This results in severe degradation of the process performance. [10]

The barrier synchronization wait function for ith thread can be represented as:

(Wbarrier)i=f ((Tbarrier)i, (Rthread)i)

Where Wbarrier is the wait time for a thread, Tbarrier is the number of threads has arrived, and Rthread is the arrival rate of threads. [11]

Experiments show that 34% of the total execution time is spent in waiting for other slower threads. [10]

Semaphores

Semaphores are signalling mechanisms which can allow one or more threads/processors to access a section. A Semaphore has a flag which has a certain fixed value associated with it and each time a thread wishes to access the section, it decrements the flag. Similarly, when the thread leaves the section, the flag is incremented. If the flag is zero, the thread cannot access the section and gets blocked if it chooses to wait.

Some semaphores would allow only one thread or process in the code section. Such Semaphores are called binary semaphore and are very similar to Mutex. Here, if the value of semaphore is 1, the thread is allowed to access and if the value is 0, the access is denied. [12]

Mathematical foundations

Synchronization was originally a process-based concept whereby a lock could be obtained on an object. Its primary usage was in databases. There are two types of (file) lock; read-only and read–write. Read-only locks may be obtained by many processes or threads. Readers–writer locks are exclusive, as they may only be used by a single process/thread at a time.

Although locks were derived for file databases, data is also shared in memory between processes and threads. Sometimes more than one object (or file) is locked at a time. If they are not locked simultaneously they can overlap, causing a deadlock exception.

Java and Ada only have exclusive locks because they are thread based and rely on the compare-and-swap processor instruction.

An abstract mathematical foundation for synchronization primitives is given by the history monoid. There are also many higher-level theoretical devices, such as process calculi and Petri nets, which can be built on top of the history monoid.

Examples

Following are some synchronization examples with respect to different platforms. [13]

In Windows

Windows provides:

In Linux

Linux provides:

Enabling and disabling of kernel preemption replaced spinlocks on uniprocessor systems. Prior to kernel version 2.6, Linux disabled interrupt to implement short critical sections. Since version 2.6 and later, Linux is fully preemptive.

In Solaris

Solaris provides:

In Pthreads

Pthreads is a platform-independent API that provides:

See also

Related Research Articles

<span class="mw-page-title-main">Mutual exclusion</span> In computing, restricting data to be accessible by one thread at a time

In computer science, mutual exclusion is a property of concurrency control, which is instituted for the purpose of preventing race conditions. It is the requirement that one thread of execution never enters a critical section while a concurrent thread of execution is already accessing said critical section, which refers to an interval of time during which a thread of execution accesses a shared resource or shared memory.

<span class="mw-page-title-main">Thread (computing)</span> Smallest sequence of programmed instructions that can be managed independently by a scheduler

In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. In many cases, a thread is a component of a process.

<span class="mw-page-title-main">Semaphore (programming)</span> Variable used in a concurrent system

In computer science, a semaphore is a variable or abstract data type used to control access to a common resource by multiple threads and avoid critical section problems in a concurrent system such as a multitasking operating system. Semaphores are a type of synchronization primitive. A trivial semaphore is a plain variable that is changed depending on programmer-defined conditions.

In computer science, a lock or mutex is a synchronization primitive that prevents state from being modified or accessed by multiple threads of execution at once. Locks enforce mutual exclusion concurrency control policies, and with a variety of possible methods there exist multiple unique implementations for different applications.

In software engineering, a spinlock is a lock that causes a thread trying to acquire it to simply wait in a loop ("spin") while repeatedly checking whether the lock is available. Since the thread remains active but is not performing a useful task, the use of such a lock is a kind of busy waiting. Once acquired, spinlocks will usually be held until they are explicitly released, although in some implementations they may be automatically released if the thread being waited on blocks or "goes to sleep".

In computer science, read-copy-update (RCU) is a synchronization mechanism that avoids the use of lock primitives while multiple threads concurrently read and update elements that are linked through pointers and that belong to shared data structures.

Peterson's algorithm is a concurrent programming algorithm for mutual exclusion that allows two or more processes to share a single-use resource without conflict, using only shared memory for communication. It was formulated by Gary L. Peterson in 1981. While Peterson's original formulation worked with only two processes, the algorithm can be generalized for more than two.

In computer science, the test-and-set instruction is an instruction used to write (set) 1 to a memory location and return its old value as a single atomic operation. The caller can then "test" the result to see if the state was changed by the call. If multiple processes may access the same memory location, and if a process is currently performing a test-and-set, no other process may begin another test-and-set until the first process's test-and-set is finished. A central processing unit (CPU) may use a test-and-set instruction offered by another electronic component, such as dual-port RAM; a CPU itself may also offer a test-and-set instruction.

In computer science, an algorithm is called non-blocking if failure or suspension of any thread cannot cause failure or suspension of another thread; for some operations, these algorithms provide a useful alternative to traditional blocking implementations. A non-blocking algorithm is lock-free if there is guaranteed system-wide progress, and wait-free if there is also guaranteed per-thread progress. "Non-blocking" was used as a synonym for "lock-free" in the literature until the introduction of obstruction-freedom in 2003.

In computer science, compare-and-swap (CAS) is an atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of that memory location to a new given value. This is done as a single atomic operation. The atomicity guarantees that the new value is calculated based on up-to-date information; if the value had been updated by another thread in the meantime, the write would fail. The result of the operation must indicate whether it performed the substitution; this can be done either with a simple boolean response, or by returning the value read from the memory location.

In concurrent programming, concurrent accesses to shared resources can lead to unexpected or erroneous behavior. Thus, the parts of the program where the shared resource is accessed need to be protected in ways that avoid the concurrent access. One way to do so is known as a critical section or critical region. This protected section cannot be entered by more than one process or thread at a time; others are suspended until the first leaves the critical section. Typically, the critical section accesses a shared resource, such as a data structure, peripheral device, or network connection, that would not operate correctly in the context of multiple concurrent accesses.

In computer science, serializing tokens are a concept in concurrency control arising from the ongoing development of DragonFly BSD. According to Matthew Dillon, they are most akin to SPLs, except a token works across multiple CPUs while SPLs only work within a single CPU's domain.

In computing, a memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction. This typically means that operations issued prior to the barrier are guaranteed to be performed before operations issued after the barrier.

In concurrent programming, a monitor is a synchronization construct that prevents threads from concurrently accessing a shared object's state and allows them to wait for the state to change. They provide a mechanism for threads to temporarily give up exclusive access in order to wait for some condition to be met, before regaining exclusive access and resuming their task. A monitor consists of a mutex (lock) and at least one condition variable. A condition variable is explicitly 'signalled' when the object's state is modified, temporarily passing the mutex to another thread 'waiting' on the conditional variable.

In computer science, a ticket lock is a synchronization mechanism, or locking algorithm, that is a type of spinlock that uses "tickets" to control which thread of execution is allowed to enter a critical section.

In computer science, the fetch-and-add (FAA) CPU instruction atomically increments the contents of a memory location by a specified value.

Concurrent computing is a form of computing in which several computations are executed concurrently—during overlapping time periods—instead of sequentially—with one completing before the next starts.

In computer science, a readers–writer is a synchronization primitive that solves one of the readers–writers problems. An RW lock allows concurrent access for read-only operations, whereas write operations require exclusive access. This means that multiple threads can read the data in parallel but an exclusive lock is needed for writing or modifying data. When a writer is writing the data, all other writers and readers will be blocked until the writer is finished writing. A common use might be to control access to a data structure in memory that cannot be updated atomically and is invalid until the update is complete.

The Java programming language and the Java virtual machine (JVM) is designed to support concurrent programming. All execution takes place in the context of threads. Objects and resources can be accessed by many separate threads. Each thread has its own path of execution, but can potentially access any object in the program. The programmer must ensure read and write access to objects is properly coordinated between threads. Thread synchronization ensures that objects are modified by only one thread at a time and prevents threads from accessing partially updated objects during modification by another thread. The Java language has built-in constructs to support this coordination.

In computer science, a concurrent data structure is a particular way of storing and organizing data for access by multiple computing threads on a computer.

References

  1. Gramoli, V. (2015). More than you ever wanted to know about synchronization: Synchrobench, measuring the impact of the synchronization on concurrent algorithms (PDF). Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM. pp. 1–10.
  2. Shengxin, Zhu and Tongxiang Gu and Xingping Liu (2014). "Minimizing synchronizations in sparse iterative solvers for distributed supercomputers". Computers & Mathematics with Applications. 67 (1): 199–209. doi: 10.1016/j.camwa.2013.11.008 .
  3. "HPCG Benchmark".
  4. Silberschatz, Abraham; Gagne, Greg; Galvin, Peter Baer (July 11, 2008). "Chapter 6: Process Synchronization". Operating System Concepts (Eighth ed.). John Wiley & Sons. ISBN   978-0-470-12872-5.
  5. Hennessy, John L.; Patterson, David A. (September 30, 2011). "Chapter 5: Thread-Level Parallelism". Computer Architecture: A Quantitative Approach (Fifth ed.). Morgan Kaufmann. ISBN   978-0-123-83872-8.
  6. "Intrinsic Locks and Synchronization". The Java Tutorials. Oracle. Retrieved 10 November 2023.
  7. "Overview of synchronization primitives". Microsoft Learn. Microsoft. Retrieved 10 November 2023.
  8. Rouse, Margaret. "Synchronization". Techopedia. Retrieved 10 November 2023.
  9. Massa, Anthony (2003). Embedded Software Development with ECos. Pearson Education Inc. ISBN   0-13-035473-2.
  10. 1 2 Meng, Chen, Pan, Yao, Wu, Jinglei, Tianzhou, Ping, Jun. Minghui (2014). "A speculative mechanism for barrier sychronization". 2014 IEEE International Conference on High Performance Computing and Communications (HPCC), 2014 IEEE 6th International Symposium on Cyberspace Safety and Security (CSS) and 2014 IEEE 11th International Conference on Embedded Software and Systems (ICESS).{{cite journal}}: CS1 maint: multiple names: authors list (link)
  11. Rahman, Mohammed Mahmudur (2012). "Process synchronization in multiprocessor and multi-core processor". 2012 International Conference on Informatics, Electronics & Vision (ICIEV). pp. 554–559. doi:10.1109/ICIEV.2012.6317471. ISBN   978-1-4673-1154-0. S2CID   8134329.
  12. Li, Yao, Qing, Carolyn (2003). Real-Time Concepts for Embedded Systems. CMP Books. ISBN   978-1578201242.{{cite book}}: CS1 maint: multiple names: authors list (link)
  13. Silberschatz, Abraham; Gagne, Greg; Galvin, Peter Baer (December 7, 2012). "Chapter 5: Process Synchronization". Operating System Concepts (Ninth ed.). John Wiley & Sons. ISBN   978-1-118-06333-0.
  14. "What is RCU, Fundamentally? [LWN.net]". lwn.net.
  15. "Adaptive Lock Probes". Oracle Docs.
  16. Mauro, Jim. "Turnstiles and priority inheritance - SunWorld - August 1999". sunsite.uakom.sk.