Stress testing

Last updated

Stress testing (sometimes called torture testing) is a form of deliberately intense or thorough testing used to determine the stability of a given system or entity. It involves testing beyond normal operational capacity, often to a breaking point, in order to observe the results. Reasons can include:

Contents

Failure causes are defects in design, process, quality, or part application, which are the underlying cause of a failure or which initiate a process which leads to failure. Where failure depends on the user of the product or process, then human error must be considered.

Reliability engineers often test items under expected stress or even under accelerated stress in order to determine the operating life of the item or to determine modes of failure. [1]

Reliability engineering is a sub-discipline of systems engineering that emphasizes dependability in the lifecycle management of a product. Dependability, or reliability, describes the ability of a system or component to function under stated conditions for a specified period of time. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.

The term "stress" may have a more specific meaning in certain industries, such as material sciences, and therefore stress testing may sometimes have a technical meaning – one example is in fatigue testing for materials.

Stress (mechanics) physical quantity that expresses internal forces in a continuous material

In continuum mechanics, stress is a physical quantity that expresses the internal forces that neighbouring particles of a continuous material exert on each other, while strain is the measure of the deformation of the material. For example, when a solid vertical bar is supporting an overhead weight, each particle in the bar pushes on the particles immediately below it. When a liquid is in a closed container under pressure, each particle gets pushed against by all the surrounding particles. The container walls and the pressure-inducing surface push against them in (Newtonian) reaction. These macroscopic forces are actually the net result of a very large number of intermolecular forces and collisions between the particles in those molecules. Stress is frequently represented by a lowercase Greek letter sigma (σ).

In materials science, fatigue is the weakening of a material caused by repeatedly applied loads. It is the progressive and localized structural damage that occurs when a material is subjected to cyclic loading. The nominal maximum stress values that cause such damage may be much less than the strength of the material typically quoted as the ultimate tensile stress limit, or the yield stress limit.

Computing

Hardware

Stress testing, in general, should put computer hardware under exaggerated levels of stress in order to ensure stability when used in a normal environment. These can include extremes of workload, type of task, memory use, thermal load (heat), clock speed, or voltages. Memory and CPU are two components that are commonly stress tested in this way.

There is considerable overlap between stress testing software and benchmarking software, since both seek to assess and measure maximum performance. Of the two, stress testing software aims to test stability by trying to force a system to fail; benchmarking aims to measure and assess the maximum performance possible at a given task or function.

When modifying the operating parameters of a CPU, such as temperature, overclocking, underclocking, overvolting, and undervolting, it may be necessary to verify if the new parameters (usually CPU core voltage and frequency) are suitable for heavy CPU loads. This is done by running a CPU-intensive program for extended periods of time, to test whether the computer hangs or crashes. CPU stress testing is also referred to as torture testing. Software that is suitable for torture testing should typically run instructions that utilise the entire chip rather than only a few of its units. Stress testing a CPU over the course of 24 hours at 100% load is, in most cases, sufficient to determine that the CPU will function correctly in normal usage scenarios such as in a desktop computer, where CPU usage typically fluctuates at low levels (50% and under).

Central processing unit electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logical, control and input/output (I/O) operations specified by the instructions

A central processing unit (CPU), also called a central processor or main processor, is the electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions. The computer industry has used the term "central processing unit" at least since the early 1960s. Traditionally, the term "CPU" refers to a processor, more specifically to its processing unit and control unit (CU), distinguishing these core elements of a computer from external components such as main memory and I/O circuitry.

Temperature physical property of matter that quantitatively expresses the common notions of hot and cold

Temperature is a physical quantity expressing hot and cold. It is measured with a thermometer calibrated in one or more temperature scales. The most commonly used scales are the Celsius scale, Fahrenheit scale, and Kelvin scale. The kelvin is the unit of temperature in the International System of Units (SI), in which temperature is one of the seven fundamental base quantities. The Kelvin scale is widely used in science and technology.

Overclocking action of increasing a components clock rate

Overclocking in the context of computing devices refers to making them "run faster" than originally intended. More specifically it is the configuration of computer hardware components to operate faster than certified by the original manufacturer, with "faster" specified as clock frequency in megahertz (MHz) or gigahertz (GHz). Commonly operating voltage is also increased to maintain a component's operational stability at accelerated speeds. Semiconductor devices operated at higher frequencies and voltages increase power consumption and heat. An overclocked device may be unreliable or fail completely if the additional heat load is not removed or power delivery components cannot meet increased power demands. Many device warranties state that overclocking and/or over-specification voids any warranty.

Hardware stress testing and stability are subjective and may vary according to how the system will be used. A stress test for a system running 24/7 or that will perform error sensitive tasks such as distributed computing or "folding" projects may differ from one that needs to be able to run a single game with reasonably reliability. For example, a comprehensive guide on overclocking Sandy Bridge found that: [2]

Distributed computing is a field of computer science that studies distributed systems. A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another. The components interact with one another in order to achieve a common goal. Three significant characteristics of distributed systems are: concurrency of components, lack of a global clock, and independent failure of components. Examples of distributed systems vary from SOA-based systems to massively multiplayer online games to peer-to-peer applications.

Folding@home is a distributed computing project for disease research that simulates protein folding, computational drug design, and other types of molecular dynamics. The project uses the idle processing resources of thousands of personal computers owned by volunteers who have installed the software on their systems. Its main purpose is to determine the mechanisms of protein folding, which is the process by which proteins reach their final three-dimensional structure, and to examine the causes of protein misfolding. This is of significant academic interest with major implications for medical research into Alzheimer's disease, Huntington's disease, and many forms of cancer, among other diseases. To a lesser extent, Folding@home also tries to predict a protein's final structure and determine how other molecules may interact with it, which has applications in drug design. Folding@home is developed and operated by the Pande Laboratory at Stanford University, under the direction of Prof. Vijay Pande, and is shared by various scientific institutions and research laboratories across the world.

Sandy Bridge Intel processor microarchitecture

Sandy Bridge is the codename for the microarchitecture used in the "second generation" of the Intel Core processors - the Sandy Bridge microarchitecture is the successor to Nehalem microarchitecture. Intel demonstrated a Sandy Bridge processor in 2009, and released first products based on the architecture in January 2011 under the Core brand. Developed primarily by the Israeli branch of Intel, the codename was originally "Gesher".

Even though in the past IntelBurnTest was just as good, it seems that something in the SB uArch [Sandy Bridge microarchitecture] is more heavily stressed with Prime95 ... IBT really does pull more power [make greater thermal demands]. But ... Prime95 failed first every time, and it failed when IBT would pass. So same as Sandy Bridge, Prime95 is a better stability tester for Sandy Bridge-E than IBT/LinX.

Stability is subjective; some might call stability enough to run their game, other like folders [folding projects] might need something that is just as stable as it was at stock, and ... would need to run Prime95 for at least 12 hours to a day or two to deem that stable ... There are [bench testers] who really don't care for stability like that and will just say if it can [complete] a benchmark it is stable enough. No one is wrong and no one is right. Stability is subjective. [But] 24/7 stability is not subjective.

An engineer at ASUS advised in a 2012 article on overclocking an Intel X79 system, that it is important to choose testing software carefully in order to obtain useful results: [3]

Intel American semiconductor company

Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, in the Silicon Valley. It is the world's second largest and second highest valued semiconductor chip manufacturer based on revenue after being overtaken by Samsung, and is the inventor of the x86 series of microprocessors, the processors found in most personal computers (PCs). Intel ranked No. 46 in the 2018 Fortune 500 list of the largest United States corporations by total revenue.

Unvalidated stress tests are not advised (such as Prime95 or LinX or other comparable applications). For high grade CPU/IMC and System Bus testing Aida64 is recommended along with general applications usage like PC Mark 7. Aida has an advantage as its stability test has been designed for the Sandy Bridge E architecture and test specific functions like AES, AVX and other instruction sets that prime and like synthetics do not touch. As such not only does it load the CPU 100% but will also test other parts of CPU not used under applications like Prime 95. Other applications to consider are SiSoft 2012 or Passmark BurnIn. Be advised validation has not been completed using Prime 95 version 26 and LinX (10.3.7.012) and OCCT 4.1.0 beta 1 but once we have internally tested to ensure at least limited support and operation.

Software commonly used in stress testing

Software

In software testing, a system stress test refers to tests that put a greater emphasis on robustness, availability, and error handling under a heavy load, rather than on what would be considered correct behavior under normal circumstances. In particular, the goals of such tests may be to ensure the software does not crash in conditions of insufficient computational resources (such as memory or disk space), unusually high concurrency, or denial of service attacks.

Examples:

Stress testing may be contrasted with load testing:

See also

Related Research Articles

In software quality assurance, performance testing is in general, a testing practice performed to determine how a system performs in terms of responsiveness and stability under a particular workload. It can also serve to investigate, measure, validate or verify other quality attributes of the system, such as scalability, reliability and resource usage.

The clock rate typically refers to the frequency at which a chip like a central processing unit (CPU), one core of a multi-core processor, is running and is used as an indicator of the processor's speed. It is measured in clock cycles per second or its equivalent, the SI unit hertz (Hz). The clock rate of the first generation of computers was measured in hertz or kilohertz (kHz), but in the 21st century the speed of modern CPUs is commonly advertised in gigahertz (GHz). This metric is most useful when comparing processors within the same family, holding constant other features that may affect performance. Video card and CPU manufacturers commonly select their highest performing units from a manufacturing batch and set their maximum clock rate higher, fetching a higher price.

Prime95 software

Prime95 is the freeware application written by George Woltman that is used by GIMPS, a distributed computing project dedicated to finding new Mersenne prime numbers. More specifically, Prime95 refers to the Windows and macOS versions of the software.

Northbridge (computing) chip on a computer motherboard

A northbridge or host bridge is one of the two chips in the core logic chipset architecture on a PC motherboard, the other being the southbridge. Unlike the southbridge, northbridge is connected directly to the CPU via the front-side bus (FSB) and is thus responsible for tasks that require the highest performance. The northbridge is usually paired with a southbridge, also known as I/O controller hub. In systems where they are included, these two chips manage communications between the CPU and other parts of the motherboard, and constitute the core logic chipset of the PC motherboard.

A power virus is a computer program that executes specific machine code to reach the maximum CPU power dissipation. Computer cooling apparatus are designed to dissipate power up to the thermal design power, rather than maximum power, and a power virus could cause the system to overheat if it does not have logic to stop the processor. This may cause permanent physical damage. Power viruses can be malicious, but are often suites of test software used for integration testing and thermal testing of computer components during the design phase of a product, or for product benchmarking.

Super PI

Super PI is a computer program that calculates pi to a specified number of digits after the decimal point—up to a maximum of 32 million. It uses Gauss–Legendre algorithm and is a Windows port of the program used by Yasumasa Kanada in 1995 to compute pi to 232 digits.

SuperPrime is a computer program used for calculating the primality of a large set of positive natural numbers. Because of its multi-threaded nature and dynamic load scheduling, it scales excellently when using more than one thread. It is commonly used as an overclocking benchmark to test the speed and stability of a system.

Dynamic frequency scaling is a technique in computer architecture whereby the frequency of a microprocessor can be automatically adjusted "on the fly" depending on the actual needs, to conserve power and reduce the amount of heat generated by the chip. Dynamic frequency scaling helps preserve battery on mobile devices and decrease cooling cost and noise on quiet computing settings, or can be useful as a security measure for overheated systems. Dynamic frequency scaling is used in all ranges of computing systems, ranging from mobile systems to data centers to reduce the power at the times of low workload.

Haswell (microarchitecture) Intel processor microarchitecture

Haswell is the codename for a processor microarchitecture developed by Intel as the "fourth-generation core" successor to the Ivy Bridge microarchitecture. Intel officially announced CPUs based on this microarchitecture on June 4, 2013, at Computex Taipei 2013, while a working Haswell chip was demonstrated at the 2011 Intel Developer Forum. With Haswell, which uses a 22 nm process, Intel also introduced low-power processors designed for convertible or "hybrid" ultrabooks, designated by the "Y" suffix.

Carry-less Multiplication (CLMUL) is an extension to the x86 instruction set used by microprocessors from Intel and AMD which was proposed by Intel in March 2008 and made available in the Intel Westmere processors announced in early 2010.

LGA 2011

LGA 2011, also called Socket R, is a CPU socket by Intel. Released on November 14, 2011, it replaces Intel's LGA 1366 and LGA 1567 in the performance and high-end desktop and server platforms. The socket has 2011 protruding pins that touch contact points on the underside of the processor.

LGA 1155 Intel microprocessor compatible socket

LGA 1155, also called Socket H2, is a socket used for Intel microprocessors based on Sandy Bridge and Ivy Bridge microarchitectures.

The Intel X79 is a Platform Controller Hub (PCH) designed and manufactured by Intel for their LGA 2011 and LGA 2011-1.

Skylake (microarchitecture) Intel processor microarchitecture

Skylake is the codename used by Intel for a processor microarchitecture that was launched in August 2015 succeeding the Broadwell microarchitecture. Skylake is a microarchitecture redesign using the same 14 nm manufacturing process technology as its predecessor, serving as a "tock" in Intel's "tick–tock" manufacturing and design model. According to Intel, the redesign brings greater CPU and GPU performance and reduced power consumption. Skylake CPUs share its microarchitecture with Kaby Lake, Coffee Lake and Cannon Lake CPUs.

Intel Quick Sync Video is Intel's brand for its dedicated video encoding and decoding hardware core. Quick Sync was introduced with the Sandy Bridge CPU microarchitecture on 9 January 2011, and has been found on the die of Intel products ever since.

Ivy Bridge (microarchitecture) Intel processor family

Ivy Bridge is the codename for the "third generation" of the Intel Core processors. Ivy Bridge is a die shrink to 22 nanometer manufacturing process based on the 32 nanometer Sandy Bridge - see tick–tock model. The name is also applied more broadly to the 22 nm die shrink of the Sandy Bridge microarchitecture based on FinFET ("3D") Tri-Gate transistors, which is also used in the Xeon and Core i7 Ivy Bridge-EX (Ivytown), Ivy Bridge-EP and Ivy Bridge-E microprocessors released in 2013.

Stress test is a form of deliberately intense and thorough testing used to determine the stability of a given system or entity. It involves testing beyond normal operational capacity, often to a breaking point, in order to observe the results. Reasons can include: to determine breaking points and safe usage limits; to confirm that the intended specifications are being met; to determine modes of failure, and to test stable operation of a part or system outside standard usage. Reliability engineers often test items under expected stress or even under accelerated stress in order to determine the operating life of the item or to determine modes of failure.

WPrime is a computer program that calculates a set number of square roots using Newton's method for estimating functions verifying the results by squaring them then comparing them with the original numbers.

References

  1. Nelson, Wayne B., (2004), Accelerated Testing - Statistical Models, Test Plans, and Data Analysis, John Wiley & Sons, New York, ISBN   0-471-69736-2
  2. Sin0822 (2011-12-24). "Sandy Bridge E Overclocking Guide: Walk through, Explanations, and Support for all X79". overclock.net. Retrieved 2 February 2013.(some text condensed)
  3. Juan Jose Guerrero III - ASUS (2012-03-29). "Intel X79 Motherboard Overclocking Guide". benchmarkreviews.com. Retrieved 2 February 2013.