Computation offloading is the transfer of resource intensive computational tasks to a separate processor, such as a hardware accelerator, or an external platform, such as a cluster, grid, or a cloud. Offloading to a coprocessor can be used to accelerate applications including: image rendering and mathematical calculations. Offloading computing to an external platform over a network can provide computing power and overcome hardware limitations of a device, such as limited computational power, storage, and energy.
The first concepts of stored-program computers were developed in the design of the ENIAC, the first general-purpose digital computer. The ENIAC was limited in performance to single tasks which led to the development of the EDVAC which would become the first computer designed to perform instructions of various types. Developing computing technologies facilitated the increase in performance of computers, and subsequently has led to a variety of configurations and architecture.
The first instances of computation offloading were the use of simple sub-processors to handle Input/output processing through a separate system called Channel I/O. This concept improved overall system performance as the mainframe only needed to set parameters for the operations while the channel processors carried out the I/O formatting and processing. During the 1970s, coprocessors began being used to accelerate floating-point arithmetic faster than earlier 8-bit and 16-bit processors which used software. As a result, math coprocessors became common for scientific and engineering calculations. Another form of coprocessor was the graphics coprocessor. As image processing became more popular, specialized graphics chips began being used to offload the creation of images from the CPU. Coprocessors were common in most computers, however, declined in usage due to the development in microprocessor technologies which integrated many coprocessor functions. Dedicated graphics processing units, however, are still widely used for their effectiveness in many tasks including; image processing, machine learning, parallel computing, computer vision, and physical simulation.
The concept of time-sharing, the sharing of computing resources, was first implemented by John McCarthy. At the time, mainframe computing was not practical due to the costs associated with purchasing and maintaining mainframe computers. Time-sharing was a viable solution for this problem as computing time could be available to smaller companies. In the 1990s telecommunications companies started offering virtual private network (VPN) services. This allowed companies to balance traffic on servers which resulted in an effective use of bandwidth. The cloud symbol became synonymous with the interaction between providers and users. This computing extended past network servers and allowed computing power to be available to users through time-sharing. The availability of virtual computers allowed users to offload tasks from a local processor. [1]
In 1997 distributed.net sought to obtain volunteer computing to solve computational intensive tasks by using the performance of networked PCs. This concept known as grid computing was associated with cloud computing systems.
The first concept of linking large mainframes to provide an effective form of parallelism was developed in the 1960s by IBM. Cluster computing was used by IBM was to increase hardware, operating system, and software performance while allowing users to run existing applications. This concept gained momentum during the 1980s as high-performance microprocessors and high-speed networks, tools for high performance distributed computing, emerged. Clusters could efficiently split and offload computation to individual nodes for increased performance while also gaining scalability. [2]
Computational tasks are handled by a central processor which executes instructions by carrying out rudimentary arithmetic, control logic, and input/output operations. The efficiency of computational tasks is dependent on the instructions per seconds that a CPU can perform which vary with different types of processors. [3] Certain application processes can be accelerated by offloading tasks from the main processor to a coprocessor while other processes might require an external processing platform.
Hardware offers greater possible performance for certain tasks when compared to software. The use of specialized hardware can perform functions faster than software processed on a CPU. Hardware has the benefit of customization which allows for dedicated technologies to be used for distinct functions. For example, a graphics processing unit (GPU), which consists of numerous low-performance cores, is more efficient at graphical computation than a CPU which features fewer, high power, cores. [4] Hardware accelerators, however, are less versatile when compared to a CPU.
Cloud Computing refers to both the applications transported over the Internet and the hardware and software in the data centers that provide services; which include data storage and computing. [5] This form of computing is reliant on high-speed internet access and infrastructure investment. [6] Through network access, a computer can migrate part of its computing to the cloud. This process involves sending data to a network of data centers that have access to computing power needed for computation.
Cluster computing is a type of parallel processing system which combines interconnected stand-alone computers to work as a single computing resource. [7] Clusters employ a parallel programming model which requires fast interconnection technologies to support high-bandwidth and low latency for communication between nodes. [2] In a shared memory model, parallel processes have access to all memory as a global address space. Multiple processors have the ability to operate independently but they share the same memory, therefore, changes in memory by one processor are reflected on all other processors. [7]
Grid computing is a group of networked computers that work together as a virtual supercomputer to perform intensive computational tasks, such as analyzing huge sets of data. Through the cloud, it is possible to create and use computer grids for purposes and specific periods. Splitting computational tasks over multiple machines significantly reduces processing time to increase efficiency and minimize wasted resources. Unlike with parallel computing, grid computing tasks usually have no time dependency associated with them, but instead, they use computers that are part of the grid only when idle, and users can perform tasks unrelated to the grid at any time. [8]
There are several advantages in offloading computation to an external processor:
There are several limitations in offloading computation to an external processor:
Cloud services can be described by 3 main cloud service models: SaaS, PaaS, and IaaS. Software as a service (SaaS) is an externally hosted service which can be easily accessed by a consumer through a web browser. Platform as a service (PaaS) is a development environment where software can be built, tested, and deployed without the user having to focus on building and maintaining computational infrastructure. Infrastructure as a service (IaaS) is access to an infrastructure's resources, network technology, and security compliance which can be used by enterprises to build software. Cloud computing services provide users with access to large amounts of computational power and storage not viable on local computers without having significant expenditure. [6]
Mobile devices, such as smartphones and wearable devices, are limited in terms of computational power, storage, and energy. Despite the constant development in key components including; CPU, GPU, memory, and wireless access technologies; mobile devices need to be portable and energy efficient. Mobile cloud computing is the combination of cloud computing and mobile computing, in which mobile devices perform computation offloading to balance the power of the cloud for accelerating application execution and saving energy consumption. In this computation offloading, a mobile device migrates part of its computing to the cloud. This process involves application partitioning, offloading decision, and distributed task execution. [11] [12]
Video games are electronic games that involve input, an interaction with a user interface, and generate output, usually visual feedback on a video display device. These input/output operations rely on a computer and its components, including the CPU, GPU, RAM, and storage. Game files are stored in a form of secondary memory which is then loaded into the main memory when executed. The CPU is responsible for processing input from the user and passing information to the GPU. The GPU does not have access to a computer's main memory so instead, graphical assets have to be loaded into VRAM which is the GPU's memory. The CPU is responsible for instructing the GPU while the GPU uses the information to render an image on to an output device. CPU's are able to run games without a GPU through software rendering, however, offloading rendering to a GPU which has specialized hardware results in improved performance. [13]
In computer networking, a thin client, sometimes called slim client or lean client, is a simple (low-performance) computer that has been optimized for establishing a remote connection with a server-based computing environment. They are sometimes known as network computers, or in their simplest form as zero clients. The server does most of the work, which can include launching software programs, performing calculations, and storing data. This contrasts with a rich client or a conventional personal computer; the former is also intended for working in a client–server model but has significant local processing power, while the latter aims to perform its function mostly locally.
A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, supercomputers have existed which can perform over 1017 FLOPS (a hundred quadrillion FLOPS, 100 petaFLOPS or 100 PFLOPS). For comparison, a desktop computer has performance in the range of hundreds of gigaFLOPS (1011) to tens of teraFLOPS (1013). Since November 2017, all of the world's fastest 500 supercomputers run on Linux-based operating systems. Additional research is being conducted in the United States, the European Union, Taiwan, Japan, and China to build faster, more powerful and technologically superior exascale supercomputers.
A workstation is a special computer designed for technical or scientific applications. Intended primarily to be used by a single user, they are commonly connected to a local area network and run multi-user operating systems. The term workstation has been used loosely to refer to everything from a mainframe computer terminal to a PC connected to a network, but the most common form refers to the class of hardware offered by several current and defunct companies such as Sun Microsystems, Silicon Graphics, Apollo Computer, DEC, HP, NeXT, and IBM which powered the 3D computer graphics revolution of the late 1990s.
A system on a chip or system-on-chip is an integrated circuit that integrates most or all components of a computer or other electronic system. These components almost always include on-chip central processing unit (CPU), memory interfaces, input/output devices and interfaces, and secondary storage interfaces, often alongside other components such as radio modems and a graphics processing unit (GPU) – all on a single substrate or microchip. SoCs may contain digital and also analog, mixed-signal and often radio frequency signal processing functions.
Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.
Reconfigurable computing is a computer architecture combining some of the flexibility of software with the high performance of hardware by processing with flexible hardware platforms like field-programmable gate arrays (FPGAs). The principal difference when compared to using ordinary microprocessors is the ability to add custom computational blocks using FPGAs. On the other hand, the main difference from custom hardware, i.e. application-specific integrated circuits (ASICs) is the possibility to adapt the hardware during runtime by "loading" a new circuit on the reconfigurable fabric, thus providing new computational blocks without the need to manufacture and add new chips to the existing system.
A coprocessor is a computer processor used to supplement the functions of the primary processor. Operations performed by the coprocessor may be floating-point arithmetic, graphics, signal processing, string processing, cryptography or I/O interfacing with peripheral devices. By offloading processor-intensive tasks from the main processor, coprocessors can accelerate system performance. Coprocessors allow a line of computers to be customized, so that customers who do not need the extra performance do not need to pay for it.
A graphics processing unit (GPU) is a specialized electronic circuit initially designed to accelerate computer graphics and image processing. After their initial design, GPUs were found to be useful for non-graphic calculations involving embarrassingly parallel problems due to their parallel structure. Other non-graphical uses include the training of neural networks and cryptocurrency mining.
General-purpose computing on graphics processing units is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU). The use of multiple video cards in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing.
Hardware acceleration is the use of computer hardware designed to perform specific functions more efficiently when compared to software running on a general-purpose central processing unit (CPU). Any transformation of data that can be calculated in software running on a generic CPU can also be calculated in custom-made hardware, or in some mix of both.
Processor may refer to:
In computer science, stream processing is a programming paradigm which views streams, or sequences of events in time, as the central input and output objects of computation. Stream processing encompasses dataflow programming, reactive programming, and distributed data processing. Stream processing systems aim to expose parallel processing for data streams and rely on streaming algorithms for efficient implementation. The software stack for these systems includes components such as programming models and query languages, for expressing computation; stream management systems, for distribution and scheduling; and hardware components for acceleration including floating-point units, graphics processing units, and field-programmable gate arrays.
The Texas Advanced Computing Center (TACC) at the University of Texas at Austin, United States, is an advanced computing research center that is based on comprehensive advanced computing resources and supports services to researchers in Texas and across the U.S. The mission of TACC is to enable discoveries that advance science and society through the application of advanced computing technologies. Specializing in high performance computing, scientific visualization, data analysis & storage systems, software, research & development and portal interfaces, TACC deploys and operates advanced computational infrastructure to enable the research activities of faculty, staff, and students of UT Austin. TACC also provides consulting, technical documentation, and training to support researchers who use these resources. TACC staff members conduct research and development in applications and algorithms, computing systems design/architecture, and programming tools and environments.
A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The newest manifestation of cluster computing is cloud computing.
Computer hardware includes the physical parts of a computer, such as the central processing unit (CPU), random access memory (RAM), motherboard, computer data storage, graphics card, sound card, and computer case. It includes external devices such as a monitor, mouse, keyboard, and speakers.
This glossary of computer hardware terms is a list of definitions of terms and concepts related to computer hardware, i.e. the physical and structural components of computers, architectural issues, and peripheral devices.
Heterogeneous System Architecture (HSA) is a cross-vendor set of specifications that allow for the integration of central processing units and graphics processors on the same bus, with shared memory and tasks. The HSA is being developed by the HSA Foundation, which includes AMD and ARM. The platform's stated aim is to reduce communication latency between CPUs, GPUs and other compute devices, and make these various devices more compatible from a programmer's perspective, relieving the programmer of the task of planning the moving of data between devices' disjoint memories.
Heterogeneous computing refers to systems that use more than one kind of processor or core. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorporating specialized processing capabilities to handle particular tasks.
TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks.
An AI accelerator, deep learning processor, or neural processing unit (NPU) is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications include algorithms for robotics, Internet of Things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2024, a typical AI integrated circuit chip contains tens of billions of MOSFETs.