Ticket lock

Last updated

In computer science, a ticket lock is a synchronization mechanism, or locking algorithm, that is a type of spinlock that uses "tickets" to control which thread of execution is allowed to enter a critical section.

Contents

Overview

Example of a ticket and "Now Serving" sign used in the Ticket Queue Management System. Take a ticket.jpg
Example of a ticket and "Now Serving" sign used in the Ticket Queue Management System.

The basic concept of a ticket lock is similar to the ticket queue management system. This is the method that many bakeries and delis use to serve customers in the order that they arrive, without making them stand in a line. Generally, there is some type of dispenser from which customers pull sequentially numbered tickets upon arrival. The dispenser usually has a sign above or near it stating something like "Please take a number". There is also typically a dynamic sign, usually digital, that displays the ticket number that is now being served. Each time the next ticket number (customer) is ready to be served, the "Now Serving" sign is incremented and the number called out. This allows all of the waiting customers to know how many people are still ahead of them in the queue or line.

Like this system, a ticket lock is a first in first out (FIFO) queue-based mechanism. It adds the benefit of fairness of lock acquisition and works as follows; there are two integer values which begin at 0. The first value is the queue ticket, the second is the dequeue ticket. The queue ticket is the thread's position in the queue, and the dequeue ticket is the ticket, or queue position, that now has the lock (Now Serving).

When a thread arrives, it atomically obtains and then increments the queue ticket. The atomicity of this operation is required to prevent two threads from simultaneously being able to obtain the same ticket number. It then compares its ticket value, before the increment, with the dequeue ticket's value. If they are the same, the thread is permitted to enter the critical section. If they are not the same, then another thread must already be in the critical section and this thread must busy-wait or yield. When a thread leaves the critical section controlled by the lock, it atomically increments the dequeue ticket. This permits the next waiting thread, the one with the next sequential ticket number, to enter the critical section. [1]

Fairness of lock acquisition

The notion of fairness in lock acquisition applies to the order in which threads acquire a lock successfully. [2] If some type of fairness is implemented, it prevents a thread from being starved out of execution for a long time due to inability to acquire a lock in favor of other threads. With no fairness guarantees, a situation can arise where a thread (or multiple threads) can take a disproportionately long time to execute as compared to others. A simple example will now be presented to show how a thread could be excessively delayed due to a lack of fairness in lock acquisition.

Assume a case where three threads, each executing on one of three processors, are executing the following pseudocode that uses a lock with no consideration for fairness.

while(1){lock{criticalsection}unlock}

Now further assume the physical arrangement of the three processors, P1, P2, and P3, results in a non-uniform memory access time to the location of the shared lock variable. The order of increasing access time to the lock variable for the three processors is P1 < P2 < P3. So P1 is always the most advantaged at acquiring the lock, followed by P2, with P3 being most disadvantaged. How this situation leads to thread starvation in the absence of a fairness guarantee is shown in the following illustration of the execution of the above pseudocode by these three processors.

Starvation of P3
TimeP1P2P3
1lock attempt (success)lock attempt (failed)lock attempt (failed)
2critical sectionspinspin
3release locklock attempt (success)lock attempt (failed)
4...critical sectionspin
5lock attempt (failed)......
6lock attempt (success)release locklock attempt (failed)
7critical sectionspinspin
............

Initially, the lock is free and all three processors attempt to acquire the lock simultaneously (Time 1). Due to P1 having the fastest access time to the lock, it acquires it first and enters the critical section. P2 and P3 now spin while P1 is in the critical section (Time 2). Upon exiting the critical section (Time 3), P1 executes an unlock, releasing the lock. Since P2 has faster access to the lock than P3, it acquires the lock next and enters the critical section (Time 4). While P2 is in the critical section, P1 once again attempts to acquire the lock but can’t (Time 5), forcing it to spin wait along with P3. Once P2 finishes the critical section and issues an unlock, both P1 and P3 simultaneously attempt to acquire it once again (Time 6). But P1, with its faster access time wins again, thus entering the critical section (Time 7). This pattern of P3 being unable to obtain the lock will continue indefinitely until either P1 or P2 stops attempting to acquire it.

This illustrates the need to ensure some level of fairness in lock acquisition in certain circumstances. Not all locks have mechanisms that ensure any level of fairness, leaving the potential for situations similar to that illustrated above. See the Comparison of locks section below for examples of locks that don't implement any fairness guarantees.

Implementation of ticket lock

In a Non-Uniform Memory Architecture (NUMA) system it is important to have a lock implementation that guarantees some level of fairness of lock acquisition. The ticket lock is an implementation of spinlock that adds this desired attribute. The following pseudocode [1] [3] shows the operations for initializing the lock, acquiring the lock, and releasing the lock. A call to ticketLock_acquire would precede the critical section of the code and ticketLock_release would follow it. Each processor will keep track of its turn via the value of each processor's my_ticket.

Yan Solihin's pseudocode example is listed in the diagram below. [1]

 1 ticketLock_init(int*next_ticket,int*now_serving) 2 { 3 *now_serving=*next_ticket=0; 4 } 5  6 ticketLock_acquire(int*next_ticket,int*now_serving) 7 { 8 my_ticket=fetch_and_inc(next_ticket); 9 while(*now_serving!=my_ticket){}10 }11 12 ticketLock_release(int*now_serving)13 {14 ++*now_serving;15 }

Following along with the pseudocode above we can see that each time a processor tries to acquire a lock with ticketLock_acquire(), fetch_and_inc is called, returning the current value of next_ticket into the thread private my_ticket and incrementing the shared next_ticket. It is important to note that the fetch and increment is done atomically, thereby not allowing any other concurrent attempts at access. Once my_ticket has been received, each thread will spin in the while loop while now_serving isn't equal to its my_ticket. Once now_serving becomes equal to a given thread's my_ticket they are allowed to return from ticketLock_acquire() and enter the critical section of code. After the critical section of the code, the thread performs ticketLock_release() which increments now_serving. This allows the thread with the next sequential my_ticket to exit from ticketLock_acquire() and enter the critical section. [3] Since the my_ticket values are acquired in the order of thread arrival at the lock, subsequent acquisition of the lock is guaranteed to also be in this same order. Thus, fairness of lock acquisition is ensured, enforcing a FIFO ordering.

The following table shows an example of ticket lock in action in a system with four processors (P1, P2, P3, P4) competing for access to the critical section. Following along with the "Action" column, the outcome based on the above pseudocode can be observed. For each row, the variable values shown are those after the indicated action(s) have completed. The key point to note from the example is that the initial attempts by all four processors to acquire the lock results in only the first to arrive actually getting the lock. All subsequent attempts, while the first still holds the lock, serves to form the queue of processors waiting their turn in the critical section. This is followed by each getting the lock in turn, allowing the next in line to acquire it as the previous holder leaves. Also note that another processor can arrive at any time during the sequence of lock acquire/releases by other processors, and simply waits its turn.

Four Processor Ticket Lock Example
RowActionnext_ticketnow_servingP1 my_ticketP2 my_ticketP3 my_ticketP4 my_ticket
1Initialized to 000----
2P1 tries to acquire lock (succeed)100---
3P3 tries to acquire lock (fail + wait)200-1-
4P2 tries to acquire lock (fail + wait)30021-
5P1 releases lock, P3 acquires lock31021-
6P3 releases lock, P2 acquires lock32021-
7P4 tries to acquire lock (fail + wait)420213
8P2 releases lock, P4 acquires lock430213
9P4 releases lock440213
10...440213

The first step, prior to use of the lock, is initialization of all lock variables (Row 1). Having next_ticket and now_serving initialized to 0 ensures that the first thread that attempts to get the lock will get ticket 0, thus acquiring the lock due to its ticket matching now_serving. So when P1 tries to acquire the lock it immediately succeeds and next_ticket is incremented to 1 (Row 2). When P3 tries to acquire the lock it gets 1 for its my_ticket, next ticket is incremented to 2, and it must wait since now_serving is still 0 (Row 3). Next, when P2 attempts to acquire the lock it gets 2 for its my_ticket, next_ticket is incremented to 3, and it must also wait due to now_serving still being 0 (Row 4). P1 now releases the lock by incrementing now_serving to 1, thus allowing P3 to acquire it due its my_ticket value of 1 (Row 5). Now P3 releases the lock, incrementing now_serving to 2, allowing P2 to acquire it (Row 6). While P2 has the lock, P4 attempts to acquire it, gets a my_ticket value of 3, increments next_ticket to 4, and must wait since now_serving is still 2 (Row 7). When P2 releases the lock, it increments now_serving to 3, allowing P4 to get it (Row 8). Finally, P4 releases the lock, incrementing now_serving to 4 (Row 9). No currently waiting threads have this ticket number, so the next thread to arrive will get 4 for its ticket and immediately acquire the lock.

Comparison of locks

The Linux kernel implementation can have lower latency than the simpler test-and-set or exchange based spinlock algorithms on modern machines. Consider the table below when comparing various types of spin based locks. The more basic locking mechanisms have lower uncontended latency than the advanced locking mechanisms. [1]

Comparing Performance of Different Locking Mechanisms [1]
Criteria test-and-set Test and Test-and-set Load-link/store-conditional Ticket ABQL
Uncontended latencyLowestLowerLowerHigherHigher
1 Release max trafficӨ(p)Ө(p)Ө(p)Ө(p)Ө(1)
Wait trafficHigh----
StorageӨ(1)Ө(1)Ө(1)Ө(1)Ө(p)
Fairness guaranteeNoNoNoYesYes

Advantages

Disadvantages

History

The ticket lock was introduced by Mellor-Crummey and Scott in 1991. [3] This algorithm was introduced into the Linux kernel in 2008 due to its advantages, [5] but was omitted in paravirtualized environments where it had disadvantages. [6] As of July 2010, work is in progress to enable the use of ticket locks in paravirtualization. [7] As of March 2015 this type of locking scheme has been reemployed by Red Hat Enterprise Linux in their system. [8]

See also

Related Research Articles

Mutual exclusion property of concurrency control, which is instituted for the purpose of preventing race conditions

In computer science, mutual exclusion is a property of concurrency control, which is instituted for the purpose of preventing race conditions. It is the requirement that one thread of execution never enters its critical section at the same time that another concurrent thread of execution enters its own critical section, which refers to an interval of time during which a thread of execution accesses a shared resource, such as shared memory.

In computer science, a semaphore is a variable or abstract data type used to control access to a common resource by multiple processes in a concurrent system such as a multitasking operating system. A semaphore is simply a variable. This variable is used to solve critical section problems and to achieve process synchronization in the multi processing environment. A trivial semaphore is a plain variable that is changed depending on programmer-defined conditions.

In computing, scheduling is the method by which work is assigned to resources that complete the work. The work may be virtual computation elements such as threads, processes or data flows, which are in turn scheduled onto hardware resources such as processors, network links or expansion cards.

In computer science, a lock or mutex is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of execution. A lock is designed to enforce a mutual exclusion concurrency control policy.

In software engineering, a spinlock is a lock which causes a thread trying to acquire it to simply wait in a loop ("spin") while repeatedly checking if the lock is available. Since the thread remains active but is not performing a useful task, the use of such a lock is a kind of busy waiting. Once acquired, spinlocks will usually be held until they are explicitly released, although in some implementations they may be automatically released if the thread being waited on blocks, or "goes to sleep".

In computer science, read-copy-update (RCU) is a synchronization mechanism based on mutual exclusion. It is used when performance of reads is crucial and is an example of space–time tradeoff, enabling fast operations at the cost of more space.

Peterson's algorithm is a concurrent programming algorithm for mutual exclusion that allows two or more processes to share a single-use resource without conflict, using only shared memory for communication. It was formulated by Gary L. Peterson in 1981. While Peterson's original formulation worked with only two processes, the algorithm can be generalized for more than two.

In computer science, the test-and-set instruction is an instruction used to write 1 (set) to a memory location and return its old value as a single atomic operation. If multiple processes may access the same memory location, and if a process is currently performing a test-and-set, no other process may begin another test-and-set until the first process's test-and-set is finished. A CPU may use a test-and-set instruction offered by another electronic component, such as dual-port RAM; a CPU itself may also offer a test-and-set instruction.

In computer science, an algorithm is called non-blocking if failure or suspension of any thread cannot cause failure or suspension of another thread; for some operations, these algorithms provide a useful alternative to traditional blocking implementations. A non-blocking algorithm is lock-free if there is guaranteed system-wide progress, and wait-free if there is also guaranteed per-thread progress.

In computer science, compare-and-swap (CAS) is an atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of that memory location to a new given value. This is done as a single atomic operation. The atomicity guarantees that the new value is calculated based on up-to-date information; if the value had been updated by another thread in the meantime, the write would fail. The result of the operation must indicate whether it performed the substitution; this can be done either with a simple boolean response, or by returning the value read from the memory location.

In computer science and software engineering, busy-waiting, busy-looping or spinning is a technique in which a process repeatedly checks to see if a condition is true, such as whether keyboard input or a lock is available. Spinning can also be used to generate an arbitrary time delay, a technique that was necessary on systems that lacked a method of waiting a specific length of time. Processor speeds vary greatly from computer to computer, especially as some processors are designed to dynamically adjust speed based on external factors, such as the load on the operating system. Consequently spinning as a time-delay technique can produce unpredictable or even inconsistent results on different systems unless code is included to determine the time a processor takes to execute a "do nothing" loop, or the looping code explicitly checks a real-time clock.

Deficit Round Robin (DRR), also Deficit Weighted Round Robin (DWRR), is a scheduling algorithm for the network scheduler. DRR is, like weighted fair queuing (WFQ), a packet-based implementation of the ideal Generalized Processor Sharing (GPS) policy. It was proposed by M. Shreedhar and G. Varghese in 1995 as an efficient and fair algorithm.

Linearizability

In concurrent programming, an operation is linearizable if it consists of an ordered list of invocation and response events (callbacks), that may be extended by adding response events such that:

  1. The extended list can be re-expressed as a sequential history, and
  2. That sequential history is a subset of the original unextended list.

In concurrent programming, a monitor is a synchronization construct that allows threads to have both mutual exclusion and the ability to wait (block) for a certain condition to become false. Monitors also have a mechanism for signaling other threads that their condition has been met. A monitor consists of a mutex (lock) object and condition variables. A condition variable essentially is a container of threads that are waiting for a certain condition. Monitors provide a mechanism for threads to temporarily give up exclusive access in order to wait for some condition to be met, before regaining exclusive access and resuming their task.

In computer science, the fetch-and-add CPU instruction (FAA) atomically increments the contents of a memory location by a specified value.

Lamport's bakery algorithm is a computer algorithm devised by computer scientist Leslie Lamport, which is intended to improve the safety in the usage of shared resources among multiple threads by means of mutual exclusion.

In computer science, a readers–writer is a synchronization primitive that solves one of the readers–writers problems. An RW lock allows concurrent access for read-only operations, while write operations require exclusive access. This means that multiple threads can read the data in parallel but an exclusive lock is needed for writing or modifying data. When a writer is writing the data, all other writers or readers will be blocked until the writer is finished writing. A common use might be to control access to a data structure in memory that cannot be updated atomically and is invalid until the update is complete.

In computer science, the readers-writers problems are examples of a common computing problem in concurrency. There are at least three variations of the problems, which deal with situations in which many threads try to access the same shared resource at one time. Some threads may read and some may write, with the constraint that no thread may access the shared resource for either reading or writing while another thread is in the act of writing to it.. A readers-writer lock is a data structure that solves one or more of the readers-writers problems.

In computer science, synchronization refers to one of two distinct but related concepts: synchronization of processes, and synchronization of data. Process synchronization refers to the idea that multiple processes are to join up or handshake at a certain point, in order to reach an agreement or commit to a certain sequence of action. Data synchronization refers to the idea of keeping multiple copies of a dataset in coherence with one another, or to maintain data integrity. Process synchronization primitives are commonly used to implement data synchronization.

Array-Based Queuing Lock (ABQL) is an advanced lock algorithm that ensures that threads spin on unique memory locations thus ensuring fairness of lock acquisition coupled with improved scalability.

References

  1. 1 2 3 4 5 6 7 8 Solihin, Yan (2009). Fundamentals of parallel computer architecture : multichip and multicore systems. Solihin Pub. pp. 262–269. ISBN   978-0-9841630-0-7.
  2. Sottile, Matthew J.; Mattson, Timothy G.; Rasmussen, Craig E (Sep 28, 2009). Introduction to Concurrency in Programming Languages. Boca Raton, FL, USA: CRC Press. p. 56. ISBN   978-1-4200-7214-3.
  3. 1 2 3 4 John M. Mellor-Crummey and Michael L. Scott; et al. (February 1991). "Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors". ACM TOCS.
  4. Boyd-Wickizer, Silas, et al. "Non-scalable locks are dangerous." Proceedings of the Linux Symposium. 2012. http://pdos.csail.mit.edu/papers/linux:lock.pdf
  5. Jonathan Corbet, Ticket spinlocks, Linux Weekly News, February 6, 2008. Retrieved 23. July, 2010.
  6. Jeremy Fitzhardinge, paravirt: introduce a "lock-byte" spinlock implementation, Linux kernel, July 7, 2008
  7. Revert "Merge branch 'xen/pvticketlock' into xen/next"
  8. "Ticket Spinlocks".
  9. Michael L. Scott and William N. Scherer III. Scalable Queue-Based Spin Locks with Timeout. Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming, pp. 44-52, 2001.