Timeout Detection and Recovery

Last updated

Timeout Detection and Recovery or TDR is a feature of the Windows operating system (OS) introduced in Windows Vista. It detects response problems from a graphics card (GPU), and if a timeout occurs, the OS will attempt a card reset to recover a functional and responsive desktop environment. However, if the attempt was unsuccessful, it results in the Blue Screen of Death (BSOD). The recovery tries to mitigate the scenario where an end user superfluously reboots their device should it become unresponsive. [1]

Contents

Timeline

When the GPU takes more than the allotted time to process a request, the system's GPU scheduler will pick up the anomaly. It then tries to preempt the particular task, this operation has the TDR timeout which is 2 seconds by default. [1] [2]

Once the timeout is up and the task is not completed or preempted, the kernel determines that the GPU is frozen and proceeds to inform the respective driver about the detected timeout. It is then the driver's responsibility to properly reset and reinitialize the underlying GPU. [1] [2]

The OS will then do a bunch of other recovery steps needed for the system to regain responsiveness. If the entire operation was successful, the end user might see some visual artefacts and a message will be shown on the screen describing what had happened ("Display driver stopped responding and has recovered."), else a BSOD might ensue. [1] [2]

Possible causes

There are multiple probable causes should a recovery fail, causing an inevitable BSOD: [2] [3]

BSOD stop codes

Possible BSOD stop codes emitted if the attempted recovery failed:

See also

Related Research Articles

<span class="mw-page-title-main">Kernel panic</span> Fatal error condition associated with Unix-like computer operating systems

A kernel panic is a safety measure taken by an operating system's kernel upon detecting an internal fatal error in which either it is unable to safely recover or continuing to run the system would have a higher risk of major data loss. The term is largely specific to Unix and Unix-like systems. The equivalent on Microsoft Windows operating systems is a stop error, often called a "blue screen of death".

<span class="mw-page-title-main">Crash (computing)</span> When a computer program stops functioning properly and self-terminates

In computing, a crash, or system crash, occurs when a computer program such as a software application or an operating system stops functioning properly and exits. On some operating systems or individual applications, a crash reporting service will report the crash and any details relating to it, usually to the developer(s) of the application. If the program is a critical part of the operating system, the entire system may crash or hang, often resulting in a kernel panic or fatal system error.

NTLDR is the boot loader for all releases of Windows NT operating system from 1993 with the release of Windows NT 3.1 up until Windows XP and Windows Server 2003. From Windows Vista onwards it was replaced by the BOOTMGR bootloader. NTLDR is typically run from the primary storage device, but it can also run from portable storage devices such as a CD-ROM, USB flash drive, or floppy disk. NTLDR can also load a non NT-based operating system given the appropriate boot sector in a file.

Radeon is a brand of computer products, including graphics processing units, random-access memory, RAM disk software, and solid-state drives, produced by Radeon Technologies Group, a division of AMD. The brand was launched in 2000 by ATI Technologies, which was acquired by AMD in 2006 for US$5.4 billion.

CONFIG.SYS is the primary configuration file for the DOS and OS/2 operating systems. It is a special ASCII text file that contains user-accessible setup or configuration directives evaluated by the operating system's DOS BIOS during boot. CONFIG.SYS was introduced with DOS 2.0.

<span class="mw-page-title-main">Watchdog timer</span> Electronic timer used to detect and recover from computer malfunctions

A watchdog timer, sometimes called a computer operating properly timer, is an electronic or software timer that is used to detect and recover from computer malfunctions. Watchdog timers are widely used in computers to facilitate automatic correction of temporary hardware faults, and to prevent errant or malevolent software from disrupting system operation.

The Advanced Host Controller Interface (AHCI) is a technical standard defined by Intel that specifies the register-level interface of Serial ATA (SATA) host controllers in a non-implementation-specific manner in its motherboard chipsets.

<span class="mw-page-title-main">CHKDSK</span> System tool in DOS, OS/2 and Windows

In computing, CHKDSK is a system tool and command in DOS, Digital Research FlexOS, IBM/Toshiba 4690 OS, IBM OS/2, Microsoft Windows and related operating systems. It verifies the file system integrity of a volume and attempts to fix logical file system errors. It is similar to the fsck command in Unix and similar to Microsoft ScanDisk, which co-existed with CHKDSK in Windows 9x and MS-DOS 6.x.

<span class="mw-page-title-main">Free and open-source graphics device driver</span> Software that controls computer-graphics hardware

A free and open-source graphics device driver is a software stack which controls computer-graphics hardware and supports graphics-rendering application programming interfaces (APIs) and is released under a free and open-source software license. Graphics device drivers are written for specific hardware to work within a specific operating system kernel and to support a range of APIs used by applications to access the graphics hardware. They may also control output to the display if the display driver is part of the graphics hardware. Most free and open-source graphics device drivers are developed by the Mesa project. The driver is made up of a compiler, a rendering API, and software which manages access to the graphics hardware.

<span class="mw-page-title-main">Error message</span> Computer message indicating an error

An error message is the information displayed when an unforeseen problem occurs, usually on a computer or other device. Modern operating systems with graphical user interfaces, often display error messages using dialog boxes. Error messages are used when user intervention is required, to indicate that a desired operation has failed, or to relay important warnings. Error messages are seen widely throughout computing, and are part of every operating system or computer hardware device. The proper design of error messages is an important topic in usability and other fields of human–computer interaction.

A machine check exception (MCE) is a type of computer error that occurs when a problem involving the computer's hardware is detected. With most mass-market personal computers, an MCE indicates faulty or misconfigured hardware.

Resolution independence is where elements on a computer screen are rendered at sizes independent from the pixel grid, resulting in a graphical user interface that is displayed at a consistent physical size, regardless of the resolution of the screen.

Windows Display Driver Model (WDDM) is the graphic driver architecture for video card drivers running Microsoft Windows versions beginning with Windows Vista.

<span class="mw-page-title-main">Screen of death</span> Fatal error displays in operating systems

In computing, a screen of death, colloquially referred to as a blue screen of death, is an informal term for a type of a computer operating system error message displayed onscreen when the system has experienced a fatal system error. The fatal error typically results in unsaved work being lost and often indicates serious problems with the system's hardware or software. These error screens are usually the result of a kernel panic, although the terms are frequently used interchangeably. Most screens of death are displayed on an even background color with a message advising the user to restart the computer.

<span class="mw-page-title-main">AMD Radeon Software</span> Device driver and utility software package for AMD GPUs and APUs

AMD Radeon Software is a device driver and utility software package for AMD's Radeon graphics cards and APUs. Its graphical user interface is built with Qt and is compatible with 64-bit Windows and Linux distributions.

<span class="mw-page-title-main">Blue screen of death</span> Error screen displayed after a fatal system error on a computer running Microsoft Windows or ReactOS

The Blue Screen of Death (BSoD), Blue screen error, Blue Screen, fatal error, or bugcheck, and officially known as a Stop error, is a critical error screen displayed by the Microsoft Windows and ReactOS operating systems in the event of a fatal system error.

In computing, a hang or freeze occurs when either a process or system ceases to respond to inputs. A typical example is when computer's graphical user interface no longer responds to the user typing on the keyboard or moving the mouse. The term covers a wide range of behaviors in both clients and servers, and is not limited to graphical user interface issues.

<span class="mw-page-title-main">Driver Verifier</span>

Driver Verifier is a tool included in Microsoft Windows that replaces the default operating system subroutines with ones that are specifically developed to catch device driver bugs. Once enabled, it monitors and stresses drivers to detect illegal function calls or actions that may be causing system corruption. It acts within the kernel mode and can target specific device drivers for continual checking or make driver verifier functionality multithreaded, so that several device drivers can be stressed at the same time. It can simulate certain conditions such as low memory, I/O verification, pool tracking, IRQL checking, deadlock detection, DMA checks, IRP logging, etc. The verifier works by forcing drivers to work with minimal resources, making potential errors that might happen only rarely in a working system manifest immediately. Typically fatal system errors are generated by the stressed drivers in the test environment, producing core dumps that can be analysed and debugged immediately; without stressing, intermittent faults would occur in the field, without proper troubleshooting facilities or personnel.

<span class="mw-page-title-main">GPU switching</span> Mechanism for computers with multiple graphic controllers

GPU switching is a mechanism used on computers with multiple graphic controllers. This mechanism allows the user to either maximize the graphic performance or prolong battery life by switching between the graphic cards. It is mostly used on gaming laptops which usually have an integrated graphic device and a discrete video card.

In computing, rebooting is the process by which a running computer system is restarted, either intentionally or unintentionally. Reboots can be either a cold reboot in which the power to the system is physically turned off and back on again ; or a warm reboot in which the system restarts while still powered up. The term restart is used to refer to a reboot when the operating system closes all programs and finalizes all pending input and output operations before initiating a soft reboot.

References

  1. 1 2 3 4 Microsoft. "Timeout detection and recovery (TDR) - Windows drivers" . Retrieved 2022-03-23.
  2. 1 2 3 4 5 Microsoft. "Bug Check 0x116 VIDEO_TDR_FAILURE - Windows drivers | Microsoft Learn" . Retrieved 2022-03-23.
  3. AMD. "How to Troubleshoot Timeout Detection and Recovery Errors | AMD" . Retrieved 2023-03-23.
  4. "Blue Screen of Death Windows 11 and 10 Error Codes List [BSOD]" . Retrieved 2022-03-23.

Further reading