In software engineering and hardware engineering, serviceability (also known as supportability) is one of the -ilities or aspects (from IBM's RAS(U) (Reliability, Availability, Serviceability, and Usability)). It refers to the ability of technical support personnel to install, configure, and monitor computer products, identify exceptions or faults, debug or isolate faults to root cause analysis, and provide hardware or software maintenance in pursuit of solving a problem and restoring the product into service. Incorporating serviceability facilitating features typically results in more efficient product maintenance and reduces operational costs and maintains business continuity.
Examples of features that facilitate serviceability include:
Serviceability engineering may also incorporate some routine system maintenance related features (see: Operations, Administration and Maintenance (OA&M.))
A service tool is defined as a facility or feature, closely tied to a product, that provides capabilities and data so as to service (analyze, monitor, debug, repair, etc.) that product. Service tools can provide broad ranges of capabilities. Regarding diagnosis, a proposed taxonomy of service tools is as follows:
As a rough rule of thumb for these taxonomies, there are multiple ‘orders of magnitude’ of diagnostic data in level 1 vs. level 2 vs. level 3 service tools.
Additional characteristics and capabilities that have been observed in service tools:
Excellent example of Serviceability Feature Requirements:
Multiple Virtual Storage, more commonly called MVS, was the most commonly used operating system on the System/370 and System/390 IBM mainframe computers. IBM developed MVS, along with OS/VS1 and SVS, as a successor to OS/360. It is unrelated to IBM's other mainframe operating system lines, e.g., VSE, VM, TPF.
An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is embedded as part of a complete device often including electrical or electronic hardware and mechanical parts. Because an embedded system typically controls physical operations of the machine that it is embedded within, it often has real-time computing constraints. Embedded systems control many devices in common use today. In 2009, it was estimated that ninety-eight percent of all microprocessors manufactured were used in embedded systems.
In computing, a core dump, memory dump, crash dump, storage dump, system dump, or ABEND dump consists of the recorded state of the working memory of a computer program at a specific time, generally when the program has crashed or otherwise terminated abnormally. In practice, other key pieces of program state are usually dumped at the same time, including the processor registers, which may include the program counter and stack pointer, memory management information, and other processor and operating system flags and information. A snapshot dump is a memory dump requested by the computer operator or by the running program, after which the program is able to continue. Core dumps are often used to assist in diagnosing and debugging errors in computer programs.
A debugger or debugging tool is a computer program used to test and debug other programs. The main use of a debugger is to run the target program under controlled conditions that permit the programmer to track its execution and monitor changes in computer resources that may indicate malfunctioning code. Typical debugging facilities include the ability to run or halt the target program at specific points, display the contents of memory, CPU registers or storage devices, and modify memory or register contents in order to enter selected test data that might be a cause of faulty program execution.
JTAG is an industry standard for verifying designs and testing printed circuit boards after manufacture.
Technical support is a call centre type customer service provided by companies to advise and assist registered users with issues concerning their technical products. Traditionally done on the phone, technical support can now be conducted online or through chat. At present, most large and mid-size companies have outsourced their tech support operations. Many companies provide discussion boards for users of their products to interact; such forums allow companies to reduce their support costs without losing the benefit of customer feedback.
In software development, a breakpoint is an intentional stopping or pausing place in a program, put in place for debugging purposes. It is also sometimes simply referred to as a pause.
Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specified period of time. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.
Design for testing or design for testability (DFT) consists of IC design techniques that add testability features to a hardware product design. The added features make it easier to develop and apply manufacturing tests to the designed hardware. The purpose of manufacturing tests is to validate that the product hardware contains no manufacturing defects that could adversely affect the product's correct functioning.
A bus analyzer is a type of a protocol analysis tool, used for capturing and analyzing communication data across a specific interface bus, usually embedded in a hardware system. The bus analyzer functionality helps design, test and validation engineers to check, test, debug and validate their designs throughout the design cycles of a hardware-based product. It also helps in later phases of a product life cycle, in examining communication interoperability between systems and between components, and clarifying hardware support concerns.
In integrated circuit design, hardware emulation is the process of imitating the behavior of one or more pieces of hardware with another piece of hardware, typically a special purpose emulation system. The emulation model is usually based on a hardware description language source code, which is compiled into the format used by emulation system. The goal is normally debugging and functional verification of the system being designed. Often an emulator is fast enough to be plugged into a working target system in place of a yet-to-be-built chip, so the whole system can be debugged with live data. This is a specific case of in-circuit emulation.
Windows Error Reporting (WER) is a crash reporting technology introduced by Microsoft with Windows XP and included in later Windows versions and Windows Mobile 5.0 and 6.0. Not to be confused with the Dr. Watson debugging tool which left the memory dump on the user's local machine, Windows Error Reporting collects and offers to send post-error debug information using the Internet to Microsoft when an application crashes or stops responding on a user's desktop. No data is sent without the user's consent. When a crash dump reaches the Microsoft server, it is analyzed, and information about a solution is sent back to the user if available. Solutions are served using Windows Error Reporting Responses. Windows Error Reporting runs as a Windows service. Kinshuman is the original architect of WER. WER was also included in the ACM hall of fame for its impact on the computing industry.
In computer science, fault injection is a testing technique for understanding how computing systems behave when stressed in unusual ways. This can be achieved using physical- or software-based means, or using a hybrid approach. Widely studied physical fault injections include the application of high voltages, extreme temperatures and electromagnetic pulses on electronic components, such as computer memory and central processing units. By exposing components to conditions beyond their intended operating limits, computing systems can be coerced into mis-executing instructions and corrupting critical data.
In the context of computer programming, instrumentation refers to the measure of a product's performance, in order to diagnose errors and to write trace information. Instrumentation can be of two types: source instrumentation and binary instrumentation.
Eclipse OpenJ9 is a high performance, scalable, Java virtual machine (JVM) implementation that is fully compliant with the Java Virtual Machine Specification.
ELinOS is a commercial development environment for embedded Linux. It consists of a Linux distribution for the target embedded system and development tools for a development host computer. The development host computer usually is a standard desktop computer running Linux or Windows. The Linux system and the application software for the target device are both created on the development host.
In computer programming and software development, debugging is the process of finding and resolving bugs within computer programs, software, or systems.
Process Control Daemon (PCD) is an open source, light-weight system level process manager/controller for Embedded Linux based projects.
An intelligent maintenance system (IMS) is a system that utilizes collected data from machinery in order to predict and prevent potential failures in them. The occurrence of failures in machinery can be costly and even catastrophic. In order to avoid failures, there needs to be a system which analyzes the behavior of the machine and provides alarms and instructions for preventive maintenance. Analyzing the behavior of the machines has become possible by means of advanced sensors, data collection systems, data storage/transfer capabilities and data analysis tools. These are the same set of tools developed for prognostics. The aggregation of data collection, storage, transformation, analysis and decision making for smart maintenance is called an intelligent maintenance system (IMS).