This article may be too technical for most readers to understand.(May 2009) |
Reconfigurable computing is a computer architecture combining some of the flexibility of software with the high performance of hardware by processing with flexible hardware platforms like field-programmable gate arrays (FPGAs). The principal difference when compared to using ordinary microprocessors is the ability to add custom computational blocks using FPGAs. On the other hand, the main difference from custom hardware, i.e. application-specific integrated circuits (ASICs) is the possibility to adapt the hardware during runtime by "loading" a new circuit on the reconfigurable fabric, thus providing new computational blocks without the need to manufacture and add new chips to the existing system.
The concept of reconfigurable computing has existed since the 1960s, when Gerald Estrin's paper proposed the concept of a computer made of a standard processor and an array of "reconfigurable" hardware. [1] [2] The main processor would control the behavior of the reconfigurable hardware. The latter would then be tailored to perform a specific task, such as image processing or pattern matching, as quickly as a dedicated piece of hardware. Once the task was done, the hardware could be adjusted to do some other task. This resulted in a hybrid computer structure combining the flexibility of software with the speed of hardware.
In the 1980s and 1990s there was a renaissance in this area of research with many proposed reconfigurable architectures developed in industry and academia, [3] such as: Copacobana, Matrix, GARP, [4] Elixent, NGEN, [5] Polyp, [6] MereGen, [7] PACT XPP, Silicon Hive, Montium, Pleiades, Morphosys, and PiCoGA. [8] Such designs were feasible due to the constant progress of silicon technology that let complex designs be implemented on one chip. Some of these massively parallel reconfigurable computers were built primarily for special subdomains such as molecular evolution, neural or image processing. The world's first commercial reconfigurable computer, the Algotronix CHS2X4, was completed in 1991. It was not a commercial success, but was promising enough that Xilinx (the inventor of the Field-Programmable Gate Array, FPGA) bought the technology and hired the Algotronix staff. [9] Later machines enabled first demonstrations of scientific principles, such as the spontaneous spatial self-organisation of genetic coding with MereGen. [10]
Early Historic Computers: | |
Programming Source | |
---|---|
Resources fixed | none |
Algorithms fixed | none |
von Neumann Computer: | |
Programming Source | |
Resources fixed | none |
Algorithms variable | Software (instruction streams) |
Reconfigurable Computing Systems: | |
Programming Source | |
Resources variable | Configware (configuration) |
Algorithms variable | Flowware (data streams) |
The fundamental model of the reconfigurable computing machine paradigm, the data-stream-based anti machine is well illustrated by the differences to other machine paradigms that were introduced earlier, as shown by Nick Tredennick's following classification scheme of computing paradigms (see "Table 1: Nick Tredennick's Paradigm Classification Scheme"). [11]
Computer scientist Reiner Hartenstein describes reconfigurable computing in terms of an anti-machine that, according to him, represents a fundamental paradigm shift away from the more conventional von Neumann machine. [12] Hartenstein calls it Reconfigurable Computing Paradox, that software-to-configware (software-to-FPGA) migration results in reported speed-up factors of up to more than four orders of magnitude, as well as a reduction in electricity consumption by up to almost four orders of magnitude—although the technological parameters of FPGAs are behind the Gordon Moore curve by about four orders of magnitude, and the clock frequency is substantially lower than that of microprocessors. This paradox is partly explained by the Von Neumann syndrome.
High-Performance Reconfigurable Computing (HPRC) is a computer architecture combining reconfigurable computing-based accelerators like field-programmable gate array with CPUs or multi-core processors.
The increase of logic in an FPGA has enabled larger and more complex algorithms to be programmed into the FPGA. The attachment of such an FPGA to a modern CPU over a high speed bus, like PCI express, has enabled the configurable logic to act more like a coprocessor rather than a peripheral. This has brought reconfigurable computing into the high-performance computing sphere.
Furthermore, by replicating an algorithm on an FPGA or the use of a multiplicity of FPGAs has enabled reconfigurable SIMD systems to be produced where several computational devices can concurrently operate on different data, which is highly parallel computing.
This heterogeneous systems technique is used in computing research and especially in supercomputing. [13] A 2008 paper reported speed-up factors of more than 4 orders of magnitude and energy saving factors by up to almost 4 orders of magnitude. [14] Some supercomputer firms offer heterogeneous processing blocks including FPGAs as accelerators.[ citation needed ] One research area is the twin-paradigm programming tool flow productivity obtained for such heterogeneous systems. [15]
The US National Science Foundation has a center for high-performance reconfigurable computing (CHREC). [16] In April 2011 the fourth Many-core and Reconfigurable Supercomputing Conference was held in Europe. [17]
Commercial high-performance reconfigurable computing systems are beginning to emerge with the announcement of IBM integrating FPGAs with its IBM Power microprocessors. [18]
Partial re-configuration is the process of changing a portion of reconfigurable hardware circuitry while the other portion keeps its former configuration. Field programmable gate arrays are often used as a support to partial reconfiguration.
Electronic hardware, like software, can be designed modularly, by creating subcomponents and then higher-level components to instantiate them. In many cases it is useful to be able to swap out one or several of these subcomponents while the FPGA is still operating.
Normally, reconfiguring an FPGA requires it to be held in reset while an external controller reloads a design onto it. Partial reconfiguration allows for critical parts of the design to continue operating while a controller either on the FPGA or off of it loads a partial design into a reconfigurable module. Partial reconfiguration also can be used to save space for multiple designs by only storing the partial designs that change between designs. [19]
A common example for when partial reconfiguration would be useful is the case of a communication device. If the device is controlling multiple connections, some of which require encryption, it would be useful to be able to load different encryption cores without bringing the whole controller down.
Partial reconfiguration is not supported on all FPGAs. A special software flow with emphasis on modular design is required. Typically the design modules are built along well defined boundaries inside the FPGA that require the design to be specially mapped to the internal hardware.
From the functionality of the design, partial reconfiguration can be divided into two groups: [20]
With the advent of affordable FPGA boards, students' and hobbyists' projects seek to recreate vintage computers or implement more novel architectures. [21] [22] [23] Such projects are built with reconfigurable hardware (FPGAs), and some devices support emulation of multiple vintage computers using a single reconfigurable hardware (C-One).
A fully FPGA-based computer is the COPACOBANA, the Cost Optimized Codebreaker and Analyzer and its successor RIVYERA. A spin-off company SciEngines GmbH of the COPACOBANA-Project of the Universities of Bochum and Kiel in Germany continues the development of fully FPGA-based computers.
Mitrionics has developed a SDK that enables software written using a single assignment language to be compiled and executed on FPGA-based computers. The Mitrion-C software language and Mitrion processor enable software developers to write and execute applications on FPGA-based computers in the same manner as with other computing technologies, such as graphical processing units ("GPUs"), cell-based processors, parallel processing units ("PPUs"), multi-core CPUs, and traditional single-core CPU clusters. (out of business)
National Instruments have developed a hybrid embedded computing system called CompactRIO. It consists of reconfigurable chassis housing the user-programmable FPGA, hot swappable I/O modules, real-time controller for deterministic communication and processing, and graphical LabVIEW software for rapid RT and FPGA programming.
Xilinx has developed two styles of partial reconfiguration of FPGA devices: module-based and difference-based. Module-based partial reconfiguration permits to reconfigure distinct modular parts of the design, while difference-based partial reconfiguration can be used when a small change is made to a design.
Intel [24] supports partial reconfiguration of their FPGA devices on 28 nm devices such as Stratix V, [25] and on the 20 nm Arria 10 devices. [26] The Intel FPGA partial reconfiguration flow for Arria 10 is based on the hierarchical design methodology in the Quartus Prime Pro software where users create physical partitions of the FPGA that can be reconfigured [27] at runtime while the remainder of the design continues to operate. The Quartus Prime Pro software also support hierarchical partial reconfiguration and simulation of partial reconfiguration.
This section needs additional citations for verification .(January 2015) |
This section possibly contains original research .(January 2015) |
As an emerging field, classifications of reconfigurable architectures are still being developed and refined as new architectures are developed; no unifying taxonomy has been suggested to date. However, several recurring parameters can be used to classify these systems.
The granularity of the reconfigurable logic is defined as the size of the smallest functional unit (configurable logic block, CLB) that is addressed by the mapping tools. High granularity, which can also be known as fine-grained, often implies a greater flexibility when implementing algorithms into the hardware. However, there is a penalty associated with this in terms of increased power, area and delay due to greater quantity of routing required per computation. Fine-grained architectures work at the bit-level manipulation level; whilst coarse grained processing elements (reconfigurable datapath unit, rDPU) are better optimised for standard data path applications. One of the drawbacks of coarse grained architectures are that they tend to lose some of their utilisation and performance if they need to perform smaller computations than their granularity provides, for example for a one bit add on a four bit wide functional unit would waste three bits. This problem can be solved by having a coarse grain array (reconfigurable datapath array, rDPA) and a FPGA on the same chip.
Coarse-grained architectures (rDPA) are intended for the implementation for algorithms needing word-width data paths (rDPU). As their functional blocks are optimized for large computations and typically comprise word wide arithmetic logic units (ALU), they will perform these computations more quickly and with more power efficiency than a set of interconnected smaller functional units; this is due to the connecting wires being shorter, resulting in less wire capacitance and hence faster and lower power designs. A potential undesirable consequence of having larger computational blocks is that when the size of operands may not match the algorithm an inefficient utilisation of resources can result. Often the type of applications to be run are known in advance allowing the logic, memory and routing resources to be tailored to enhance the performance of the device whilst still providing a certain level of flexibility for future adaptation. Examples of this are domain specific arrays aimed at gaining better performance in terms of power, area, throughput than their more generic finer grained FPGA cousins by reducing their flexibility.
Configuration of these reconfigurable systems can happen at deployment time, between execution phases or during execution. In a typical reconfigurable system, a bit stream is used to program the device at deployment time. Fine grained systems by their own nature require greater configuration time than more coarse-grained architectures due to more elements needing to be addressed and programmed. Therefore, more coarse-grained architectures gain from potential lower energy requirements, as less information is transferred and utilised. Intuitively, the slower the rate of reconfiguration the smaller the power consumption as the associated energy cost of reconfiguration are amortised over a longer period of time. Partial re-configuration aims to allow part of the device to be reprogrammed while another part is still performing active computation. Partial re-configuration allows smaller reconfigurable bit streams thus not wasting energy on transmitting redundant information in the bit stream. Compression of the bit stream is possible but careful analysis is to be carried out to ensure that the energy saved by using smaller bit streams is not outweighed by the computation needed to decompress the data.
Often the reconfigurable array is used as a processing accelerator attached to a host processor. The level of coupling determines the type of data transfers, latency, power, throughput and overheads involved when utilising the reconfigurable logic. Some of the most intuitive designs use a peripheral bus to provide a coprocessor like arrangement for the reconfigurable array. However, there have also been implementations where the reconfigurable fabric is much closer to the processor, some are even implemented into the data path, utilising the processor registers. The job of the host processor is to perform the control functions, configure the logic, schedule data and to provide external interfacing.
The flexibility in reconfigurable devices mainly comes from their routing interconnect. One style of interconnect made popular by FPGAs vendors, Xilinx and Altera are the island style layout, where blocks are arranged in an array with vertical and horizontal routing. A layout with inadequate routing may suffer from poor flexibility and resource utilisation, therefore providing limited performance. If too much interconnect is provided this requires more transistors than necessary and thus more silicon area, longer wires and more power consumption.
One of the key challenges for reconfigurable computing is to enable higher design productivity and provide an easier way to use reconfigurable computing systems for users that are unfamiliar with the underlying concepts. One way of doing this is to provide standardization and abstraction, usually supported and enforced by an operating system. [28]
One of the major tasks of an operating system is to hide the hardware and present programs (and their programmers) with nice, clean, elegant, and consistent abstractions to work with instead. In other words, the two main tasks of an operating system are abstraction and resource management. [28]
Abstraction is a powerful mechanism to handle complex and different (hardware) tasks in a well-defined and common manner. One of the most elementary OS abstractions is a process. A process is a running application that has the perception (provided by the OS) that it is running on its own on the underlying virtual hardware. This can be relaxed by the concept of threads, allowing different tasks to run concurrently on this virtual hardware to exploit task level parallelism. To allow different processes and threads to coordinate their work, communication and synchronization methods have to be provided by the OS. [28]
In addition to abstraction, resource management of the underlying hardware components is necessary because the virtual computers provided to the processes and threads by the operating system need to share available physical resources (processors, memory, and devices) spatially and temporarily. [28]
Processor design is a subfield of computer science and computer engineering (fabrication) that deals with creating a processor, a key component of computer hardware.
A field-programmable gate array (FPGA) is a type of configurable integrated circuit that can be repeatedly programmed after manufacturing. FPGAs are a subset of logic devices referred to as programmable logic devices (PLDs). They consist of an array of programmable logic blocks with a connecting grid, that can be configured "in the field" to interconnect with other logic blocks to perform various digital functions. FPGAs are often used in limited (low) quantity production of custom-made products, and in research and development, where the higher cost of individual FPGAs is not as important, and where creating and manufacturing a custom circuit wouldn't be feasible. Other applications for FPGAs include the telecommunications, automotive, aerospace, and industrial sectors, which benefit from their flexibility, high signal processing speed, and parallel processing abilities.
A system on a chip or system-on-chip is an integrated circuit that integrates most or all components of a computer or other electronic system. These components almost always include on-chip central processing unit (CPU), memory interfaces, input/output devices and interfaces, and secondary storage interfaces, often alongside other components such as radio modems and a graphics processing unit (GPU) – all on a single substrate or microchip. SoCs may contain digital and also analog, mixed-signal and often radio frequency signal processing functions.
Evolvable hardware (EH) is a field focusing on the use of evolutionary algorithms (EA) to create specialized electronics without manual engineering. It brings together reconfigurable hardware, evolutionary computation, fault tolerance and autonomous systems. Evolvable hardware refers to hardware that can change its architecture and behavior dynamically and autonomously by interacting with its environment.
Granularity is the degree to which a material or system is composed of distinguishable pieces, "granules" or "grains" (metaphorically). It can either refer to the extent to which a larger entity is subdivided, or the extent to which groups of smaller indistinguishable entities have joined together to become larger distinguishable entities.
Nios II is a 32-bit embedded processor architecture designed specifically for the Altera family of field-programmable gate array (FPGA) integrated circuits. Nios II incorporates many enhancements over the original Nios architecture, making it more suitable for a wider range of embedded computing applications, from digital signal processing (DSP) to system-control.
Hardware acceleration is the use of computer hardware designed to perform specific functions more efficiently when compared to software running on a general-purpose central processing unit (CPU). Any transformation of data that can be calculated in software running on a generic CPU can also be calculated in custom-made hardware, or in some mix of both.
Software/Configware Co-Compilation is used for reconfigurable computing to generate the code for both, an instruction-stream-based microprocessor and a reconfigurable accelerator interfaced to it. Such a co-compiler has a partitioner which accepts input from a high level language source, such as, for instance a programming language, or the output from tools like MATLAB, and automatically partitions it into parallelizable parts suitable for the reconfigurable accelerator and the rest for running on the microprocessor. By loop transformations the partitioner converts the parallelizable parts into a configware source, which is compiled by a Configware Compiler generating configware code for the configuration of the reconfigurable accelerator like, for instance an FPGA, or a coarse-grained reconfigurable array, and flowware code for organizing the data streams going from and to the accelerator.
Impulse C is a subset of the C programming language combined with a C-compatible function library supporting parallel programming, in particular for programming of applications targeting FPGA devices. It is developed by Impulse Accelerated Technologies of Kirkland, Washington.
This is a glossary of terms used in the field of Reconfigurable computing and reconfigurable computing systems, as opposed to the traditional Von Neumann architecture.
Ambric, Inc. was a designer of computer processors that developed the Ambric architecture. Its Am2045 Massively Parallel Processor Array (MPPA) chips were primarily used in high-performance embedded systems such as medical imaging, video, and signal-processing.
Lateral computing is a lateral thinking approach to solving computing problems. Lateral thinking has been made popular by Edward de Bono. This thinking technique is applied to generate creative ideas and solve problems. Similarly, by applying lateral-computing techniques to a problem, it can become much easier to arrive at a computationally inexpensive, easy to implement, efficient, innovative or unconventional solution.
An array is a systematic arrangement of similar objects, usually in rows and columns.
James Hoe is a Taiwanese-American professor of Electrical and Computer Engineering at Carnegie Mellon University (CMU). He is interested in many aspects of computer architecture and digital hardware design, including the specific areas of FPGA architecture for computing; digital signal processing hardware; and high-level hardware design and synthesis. Professor Hoe’s current research focus is on devising a new FPGA architecture for power efficient, high-performance computing. His research group is working on developing an FPGA runtime environment that incorporates partial reconfiguration, virtualization, and protection features to manage an FPGA as a dynamically sharable multitasking compute resource.
Computing with Memory refers to computing platforms where function response is stored in memory array, either one or two-dimensional, in the form of lookup tables (LUTs) and functions are evaluated by retrieving the values from the LUTs. These computing platforms can follow either a purely spatial computing model, as in field-programmable gate array (FPGA), or a temporal computing model, where a function is evaluated across multiple clock cycles. The latter approach aims at reducing the overhead of programmable interconnect in FPGA by folding interconnect resources inside a computing element. It uses dense two-dimensional memory arrays to store large multiple-input multiple-output LUTs. Computing with Memory differs from Computing in Memory or processor-in-memory (PIM) concepts, widely investigated in the context of integrating a processor and memory on the same chip to reduce memory latency and increase bandwidth. These architectures seek to reduce the distance the data travels between the processor and the memory. The Berkeley IRAM project is one notable contribution in the area of PIM architectures.
Heterogeneous computing refers to systems that use more than one kind of processor or core. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorporating specialized processing capabilities to handle particular tasks.
The Xputer is a design for a reconfigurable computer, proposed by computer scientist Reiner Hartenstein. Hartenstein uses various terms to describe the various innovations in the design, including config-ware, flow-ware, morph-ware, and "anti-machine".
Verilog-to-Routing (VTR) is an open source CAD flow for FPGA devices. VTR's main purpose is to map a given circuit described in Verilog, a Hardware Description Language, on a given FPGA architecture for research and development purposes; the FPGA architecture targeted could be a novel architecture that a researcher wishes to explore, or it could be an existing commercial FPGA whose architecture has been captured in the VTR input format. The VTR project has many contributors, with lead collaborating universities being the University of Toronto, the University of New Brunswick, and the University of California, Berkeley. Additional contributors include Google, The University of Utah, Princeton University, Altera, Intel, Texas Instruments, and MIT Lincoln Lab.
Lesley Shannon is a Canadian professor who is Chair for the Computer Engineering Option in the School of Engineering Science at Simon Fraser University. She is also the current NSERC Chair for Women in Science and Engineering for BC and Yukon. Shannon's chair operates the Westcoast Women in Engineering, Science and Technology (WWEST) program to promote equity, diversity and inclusion in STEM.
{{cite book}}
: CS1 maint: others (link){{cite book}}
: CS1 maint: others (link)