Timeout Detection and Recovery

Last updated

Timeout Detection and Recovery or TDR is a feature of the Windows operating system (OS) introduced in Windows Vista. It detects response problems from a graphics card (GPU), and if a timeout occurs, the OS will attempt a card reset to recover a functional and responsive desktop environment. However, if the attempt was unsuccessful, it results in the Blue Screen of Death (BSOD). The recovery tries to mitigate the scenario where an end user superfluously reboots their device should it become unresponsive. [1]

Contents

Timeline

When the GPU takes more than the allotted time to process a request, the system's GPU scheduler will pick up the anomaly. It then tries to preempt the particular task, this operation has the TDR timeout which is 2 seconds by default. [1] [2]

Once the timeout is up and the task is not completed or preempted, the kernel determines that the GPU is frozen and proceeds to inform the respective driver about the detected timeout. It is then the driver's responsibility to properly reset and reinitialize the underlying GPU. [1] [2]

The OS will then do a bunch of other recovery steps needed for the system to regain responsiveness. If the entire operation was successful, the end user might see some visual artefacts and a message will be shown on the screen describing what had happened ("Display driver stopped responding and has recovered."), else a BSOD might ensue. [1] [2]

Possible causes

There are multiple probable causes should a recovery fail, causing an inevitable BSOD: [2] [3]

BSOD stop codes

Possible BSOD stop codes emitted if the attempted recovery failed:

See also

Related Research Articles

<span class="mw-page-title-main">Kernel panic</span> Fatal error condition associated with Unix-like computer operating systems

A kernel panic is a safety measure taken by an operating system's kernel upon detecting an internal fatal error in which either it is unable to safely recover or continuing to run the system would have a higher risk of major data loss. The term is largely specific to Unix and Unix-like systems. The equivalent on Microsoft Windows operating systems is a stop error, often called a "blue screen of death".

<span class="mw-page-title-main">Crash (computing)</span> Unexpected program exit due to an error

In computing, a crash, or system crash, occurs when a computer program such as a software application or an operating system stops functioning properly and exits. On some operating systems or individual applications, a crash reporting service will report the crash and any details relating to it, usually to the developer(s) of the application. If the program is a critical part of the operating system, the entire system may crash or hang, often resulting in a kernel panic or fatal system error, on Windows this can result in a Blue Screen.

NTLDR is the boot loader for all releases of Windows NT operating system from 1993 with the release of Windows NT 3.1 up until Windows XP and Windows Server 2003. From Windows Vista onwards it was replaced by the BOOTMGR bootloader. NTLDR is typically run from the primary storage device, but it can also run from portable storage devices such as a CD-ROM, USB flash drive, or floppy disk. NTLDR can also load a non NT-based operating system given the appropriate boot sector in a file.

<span class="mw-page-title-main">Graphics processing unit</span> Specialized electronic circuit; graphics accelerator

A graphics processing unit (GPU) is a specialized electronic circuit initially designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal computers, workstations, and game consoles. After their initial design, GPUs were found to be useful for non-graphic calculations involving embarrassingly parallel problems due to their parallel structure. Other non-graphical uses include the training of neural networks and cryptocurrency mining.

CONFIG.SYS is the primary configuration file for the DOS and OS/2 operating systems. It is a special ASCII text file that contains user-accessible setup or configuration directives evaluated by the operating system's DOS BIOS during boot. CONFIG.SYS was introduced with DOS 2.0.

<span class="mw-page-title-main">Watchdog timer</span> Electronic timer used to detect and recover from computer malfunctions

A watchdog timer, sometimes called a computer operating properly timer, is an electronic or software timer that is used to detect and recover from computer malfunctions. Watchdog timers are widely used in computers to facilitate automatic correction of temporary hardware faults, and to prevent errant or malevolent software from disrupting system operation.

The Advanced Host Controller Interface (AHCI) is a technical standard defined by Intel that specifies the register-level interface of Serial ATA (SATA) host controllers in a non-implementation-specific manner in its motherboard chipsets.

<span class="mw-page-title-main">CHKDSK</span> System tool in DOS, OS/2 and Windows

In computing, CHKDSK is a system tool and command in DOS and Microsoft Windows, as well as Digital Research FlexOS, IBM/Toshiba 4690 OS, IBM OS/2. It verifies the integrity of the file system on a volume and attempts to fix logical file system errors. Logical errors are typically defined as software-level problems with a filesystem as a result of prior software malfunction or irregular use. Logical errors are contrasted with and usually less severe than hardware-level errors, which can not be fixed with CHKDSK and may instead require data recovery software or expert assistance. CHKDSK is similar to the fsck command in Unix and similar to Microsoft ScanDisk, which co-existed with CHKDSK in Windows 9x and MS-DOS 6.x.

<span class="mw-page-title-main">Free and open-source graphics device driver</span> Software that controls computer-graphics hardware

A free and open-source graphics device driver is a software stack which controls computer-graphics hardware and supports graphics-rendering application programming interfaces (APIs) and is released under a free and open-source software license. Graphics device drivers are written for specific hardware to work within a specific operating system kernel and to support a range of APIs used by applications to access the graphics hardware. They may also control output to the display if the display driver is part of the graphics hardware. Most free and open-source graphics device drivers are developed by the Mesa project. The driver is made up of a compiler, a rendering API, and software which manages access to the graphics hardware.

<span class="mw-page-title-main">Fatal system error</span> Error that stops the operating system

A fatal system error occurs when an operating system halts because it has reached a condition where it can no longer operate safely.

<span class="mw-page-title-main">Error message</span> Computer message indicating an error

An error message is the information displayed when an unforeseen problem occurs, usually on a computer or other device. Modern operating systems with graphical user interfaces, often display error messages using dialog boxes. Error messages are used when user intervention is required, to indicate that a desired operation has failed, or to relay important warnings. Error messages are seen widely throughout computing, and are part of every operating system or computer hardware device. The proper design of error messages is an important topic in usability and other fields of human–computer interaction.

A machine check exception (MCE) is a type of computer error that occurs when a problem involving the computer's hardware is detected. With most mass-market personal computers, an MCE indicates faulty or misconfigured hardware.

Windows Display Driver Model is the graphic driver architecture for video card drivers running Microsoft Windows versions beginning with Windows Vista.

<span class="mw-page-title-main">Screen of death</span> Fatal error displays in operating systems

In computing, a screen of death, colloquially referred to as a blue screen of death, is an informal term for a type of a computer operating system error message displayed onscreen when the system has experienced a fatal system error. The fatal error typically results in unsaved work being lost and often indicates serious problems with the system's hardware or software. These error screens are usually the result of a kernel panic, although the terms are frequently used interchangeably. Most screens of death are displayed on an even background color with a message advising the user to restart the computer.

Intel oneAPI DPC++/C++ Compiler and Intel C++ Compiler Classic are Intel’s C, C++, SYCL, and Data Parallel C++ (DPC++) compilers for Intel processor-based systems, available for Windows, Linux, and macOS operating systems.

<span class="mw-page-title-main">Blue screen of death</span> Fatal system error screen

The blue screen of death (BSoD) – or blue screen error, blue screen, fatal error, bugcheck, and officially known as a stop error – is a critical error screen displayed by the Microsoft Windows operating systems to indicate a system crash, in which the operating system reaches a critical condition where it can no longer operate safely.

In computing, a hang or freeze occurs when either a process or system ceases to respond to inputs. A typical example is when computer's graphical user interface no longer responds to the user typing on the keyboard or moving the mouse. The term covers a wide range of behaviors in both clients and servers, and is not limited to graphical user interface issues.

<span class="mw-page-title-main">Driver Verifier</span> Windows driver troubleshooter

Driver Verifier is a tool included in Microsoft Windows that replaces the default operating system subroutines with ones that are specifically developed to catch device driver bugs. Once enabled, it monitors and stresses drivers to detect illegal function calls or actions that may be causing system corruption. It acts within the kernel mode and can target specific device drivers for continual checking or make driver verifier functionality multithreaded, so that several device drivers can be stressed at the same time. It can simulate certain conditions such as low memory, I/O verification, pool tracking, IRQL checking, deadlock detection, DMA checks, IRP logging, etc. The verifier works by forcing drivers to work with minimal resources, making potential errors that might happen only rarely in a working system manifest immediately. Typically fatal system errors are generated by the stressed drivers in the test environment, producing core dumps that can be analysed and debugged immediately; without stressing, intermittent faults would occur in the field, without proper troubleshooting facilities or personnel.

<span class="mw-page-title-main">GPU switching</span> Mechanism for computers with multiple graphic controllers

GPU switching is a mechanism used on computers with multiple graphic controllers. This mechanism allows the user to either maximize the graphic performance or prolong battery life by switching between the graphic cards. It is mostly used on gaming laptops which usually have an integrated graphic device and a discrete video card.

In computing, rebooting is the process by which a running computer system is restarted, either intentionally or unintentionally. Reboots can be either a cold reboot in which the power to the system is physically turned off and back on again ; or a warm reboot in which the system restarts while still powered up. The term restart is used to refer to a reboot when the operating system closes all programs and finalizes all pending input and output operations before initiating a soft reboot.

References

  1. 1 2 3 4 Microsoft. "Timeout detection and recovery (TDR) - Windows drivers" . Retrieved 2022-03-23.
  2. 1 2 3 4 5 Microsoft. "Bug Check 0x116 VIDEO_TDR_FAILURE - Windows drivers | Microsoft Learn" . Retrieved 2022-03-23.
  3. AMD. "How to Troubleshoot Timeout Detection and Recovery Errors | AMD" . Retrieved 2023-03-23.
  4. "Blue Screen of Death Windows 11 and 10 Error Codes List [BSOD]". 11 February 2020. Retrieved 2022-03-23.

Further reading