Busy waiting

Last updated

In computer science and software engineering, busy-waiting, busy-looping or spinning is a technique in which a process repeatedly checks to see if a condition is true, such as whether keyboard input or a lock is available. Spinning can also be used to generate an arbitrary time delay, a technique that was necessary on systems that lacked a method of waiting a specific length of time. Processor speeds vary greatly from computer to computer, especially as some processors are designed to dynamically adjust speed based on current workload. [1] Consequently, spinning as a time-delay technique can produce unpredictable or even inconsistent results on different systems unless code is included to determine the time a processor takes to execute a "do nothing" loop, or the looping code explicitly checks a real-time clock.

Contents

In most cases spinning is considered an anti-pattern and should be avoided, [2] as processor time that could be used to execute a different task is instead wasted on useless activity. Spinning can be a valid strategy in certain circumstances, most notably in the implementation of spinlocks within operating systems designed to run on SMP systems.

Example C code

The following C code examples illustrate two threads that share a global integer i. The first thread uses busy-waiting to check for a change in the value of i:

#include<pthread.h>#include<stdatomic.h>#include<stdio.h>#include<stdlib.h>#include<unistd.h>/* i is global, so it is visible to all functions. It makes use of the special * type atomic_int, which allows atomic memory accesses. */atomic_inti=0;/* f1 uses a spinlock to wait for i to change from 0. */staticvoid*f1(void*p){intlocal_i;/* Atomically load current value of i into local_i and check if that value       is zero */while((local_i=atomic_load(&i))==0){/* do nothing - just keep checking over and over */}printf("i's value has changed to %d.\n",local_i);returnNULL;}staticvoid*f2(void*p){intlocal_i=99;sleep(10);/* sleep for 10 seconds */atomic_store(&i,local_i);printf("t2 has changed the value of i to %d.\n",local_i);returnNULL;}intmain(){intrc;pthread_tt1,t2;rc=pthread_create(&t1,NULL,f1,NULL);if(rc!=0){fprintf(stderr,"pthread f1 failed\n");returnEXIT_FAILURE;}rc=pthread_create(&t2,NULL,f2,NULL);if(rc!=0){fprintf(stderr,"pthread f2 failed\n");returnEXIT_FAILURE;}pthread_join(t1,NULL);pthread_join(t2,NULL);puts("All pthreads finished.");return0;}

In a use case like this, one can consider using C11's condition variables.

Alternatives

Most operating systems and threading libraries provide a variety of system calls that will block the process on an event, such as lock acquisition, timer changes, I/O availability or signals. Using such calls generally produces the simplest, most efficient, fair, and race-free result. A single call checks, informs the scheduler of the event it is waiting for, inserts a memory barrier where applicable, and may perform a requested I/O operation before returning. Other processes can use the CPU while the caller is blocked. The scheduler is given the information needed to implement priority inheritance or other mechanisms to avoid starvation.

Busy-waiting itself can be made much less wasteful by using a delay function (e.g., sleep() ) found in most operating systems. This puts a thread to sleep for a specified time, during which the thread will waste no CPU time. If the loop is checking something simple then it will spend most of its time asleep and will waste very little CPU time.

In programs that never end (such as operating systems), infinite busy waiting can be implemented by using unconditional jumps as shown by this NASM syntax: jmp $. The CPU will unconditionally jump to its own position forever. A busy wait like this can be replaced with:

sleep:hltjmpsleep

For more information, see HLT (x86 instruction).

Appropriate usage

In low-level programming, busy-waits may actually be desirable. It may not be desirable or practical to implement interrupt-driven processing for every hardware device, particularly those that are seldom accessed. Sometimes it is necessary to write some sort of control data to hardware and then fetch device status resulting from the write operation, status that may not become valid until a number of machine cycles have elapsed following the write. The programmer could call an operating system delay function, but doing so may consume more time than would be expended in spinning for a few clock cycles waiting for the device to return its status.

See also

Related Research Articles

<span class="mw-page-title-main">Mutual exclusion</span> In computing, restricting data to be accessible by one thread at a time

In computer science, mutual exclusion is a property of concurrency control, which is instituted for the purpose of preventing race conditions. It is the requirement that one thread of execution never enters a critical section while a concurrent thread of execution is already accessing said critical section, which refers to an interval of time during which a thread of execution accesses a shared resource or shared memory.

Thread safety is a computer programming concept applicable to multi-threaded code. Thread-safe code only manipulates shared data structures in a manner that ensures that all threads behave properly and fulfill their design specifications without unintended interaction. There are various strategies for making thread-safe data structures.

Processor affinity, or CPU pinning or "cache affinity", enables the binding and unbinding of a process or a thread to a central processing unit (CPU) or a range of CPUs, so that the process or thread will execute only on the designated CPU or CPUs rather than any CPU. This can be viewed as a modification of the native central queue scheduling algorithm in a symmetric multiprocessing operating system. Each item in the queue has a tag indicating its kin processor. At the time of resource allocation, each task is allocated to its kin processor in preference to others.

In computer science, a lock or mutex is a synchronization primitive: a mechanism that enforces limits on access to a resource when there are many threads of execution. A lock is designed to enforce a mutual exclusion concurrency control policy, and with a variety of possible methods there exists multiple unique implementations for different applications.

In software engineering, a spinlock is a lock that causes a thread trying to acquire it to simply wait in a loop ("spin") while repeatedly checking whether the lock is available. Since the thread remains active but is not performing a useful task, the use of such a lock is a kind of busy waiting. Once acquired, spinlocks will usually be held until they are explicitly released, although in some implementations they may be automatically released if the thread being waited on blocks or "goes to sleep".

In computer science, read-copy-update (RCU) is a synchronization mechanism that avoids the use of lock primitives while multiple threads concurrently read and update elements that are linked through pointers and that belong to shared data structures.

<span class="mw-page-title-main">Dining philosophers problem</span> Problem used to illustrate synchronization issues and techniques for resolving them

In computer science, the dining philosophers problem is an example problem often used in concurrent algorithm design to illustrate synchronization issues and techniques for resolving them.

In computer science, the test-and-set instruction is an instruction used to write (set) 1 to a memory location and return its old value as a single atomic operation. The caller can then "test" the result to see if the state was changed by the call. If multiple processes may access the same memory location, and if a process is currently performing a test-and-set, no other process may begin another test-and-set until the first process's test-and-set is finished. A central processing unit (CPU) may use a test-and-set instruction offered by another electronic component, such as dual-port RAM; a CPU itself may also offer a test-and-set instruction.

In computer science, compare-and-swap (CAS) is an atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of that memory location to a new given value. This is done as a single atomic operation. The atomicity guarantees that the new value is calculated based on up-to-date information; if the value had been updated by another thread in the meantime, the write would fail. The result of the operation must indicate whether it performed the substitution; this can be done either with a simple boolean response, or by returning the value read from the memory location.

In computing, POSIX Threads, commonly known as pthreads, is an execution model that exists independently from a programming language, as well as a parallel execution model. It allows a program to control multiple different flows of work that overlap in time. Each flow of work is referred to as a thread, and creation and control over these flows is achieved by making calls to the POSIX Threads API. POSIX Threads is an API defined by the Institute of Electrical and Electronics Engineers (IEEE) standard POSIX.1c, Threads extensions .

In concurrent programming, a monitor is a synchronization construct that allows threads to have both mutual exclusion and the ability to wait (block) for a certain condition to become false. Monitors also have a mechanism for signaling other threads that their condition has been met. A monitor consists of a mutex (lock) object and condition variables. A condition variable is essentially a container of threads that are waiting for a certain condition. Monitors provide a mechanism for threads to temporarily give up exclusive access in order to wait for some condition to be met, before regaining exclusive access and resuming their task.

In computer science, a ticket lock is a synchronization mechanism, or locking algorithm, that is a type of spinlock that uses "tickets" to control which thread of execution is allowed to enter a critical section.

In computing, sigaction is a function API defined by POSIX to give the programmer access to what should be a program's behavior when receiving specific OS signals.

Process Contention Scope is one of the two basic ways of scheduling threads. Both of them being: process local scheduling and system global scheduling. These scheduling classes are known as the scheduling contention scope, and are defined only in POSIX. Process contention scope scheduling means that all of the scheduling mechanism for the thread is local to the process—the thread's library has full control over which thread will be scheduled on an LWP. This also implies the use of either the Many- to-One or Many-to-Many model.

In computer science, synchronization is the task of coordinating multiple of processes to join up or handshake at a certain point, in order to reach an agreement or commit to a certain sequence of action.

<span class="mw-page-title-main">UltraSPARC T2</span> Microprocessor by Sun Microsystems

Sun Microsystems' UltraSPARC T2 microprocessor is a multithreading, multi-core CPU. It is a member of the SPARC family, and the successor to the UltraSPARC T1. The chip is sometimes referred to by its codename, Niagara 2. Sun started selling servers with the T2 processor in October 2007.

In computer science, a fiber is a particularly lightweight thread of execution.

<span class="mw-page-title-main">CPU time</span> Time used by a computer

CPU time is the amount of time for which a central processing unit (CPU) was used for processing instructions of a computer program or operating system, as opposed to elapsed time, which includes for example, waiting for input/output (I/O) operations or entering low-power (idle) mode. The CPU time is measured in clock ticks or seconds. Often, it is useful to measure CPU time as a percentage of the CPU's capacity, which is called the CPU usage. CPU time and CPU usage have two main uses.

In computing, the producer-consumer problem is a family of problems described by Edsger W. Dijkstra since 1965.

Logical Domains is the server virtualization and partitioning technology for SPARC V9 processors. It was first released by Sun Microsystems in April 2007. After the Oracle acquisition of Sun in January 2010, the product has been re-branded as Oracle VM Server for SPARC from version 2.0 onwards.

References

  1. "Intel Turbo Boost Technology".
  2. "Why the 'volatile' type class should not be used". Archived from the original on 2017-10-04. Retrieved 2013-06-10.