C-slowing

Last updated

C-slow retiming is a technique used in conjunction with retiming to improve throughput of a digital circuit. Each register in a circuit is replaced by a set of C registers (in series). This creates a circuit with C independent threads, as if the new circuit contained C copies of the original circuit. A single computation of the original circuit takes C times as many clock cycles to compute in the new circuit. C-slowing by itself increases latency, but throughput remains the same.

Increasing the number of registers allows optimization of the circuit through retiming to reduce the clock period of the circuit. In the best case, the clock period can be reduced by a factor of C. Reducing the clock period of the circuit reduces latency and increases throughput. Thus, for computations that can be multi-threaded, combining C-slowing with retiming can increase the throughput of the circuit, with little, or in the best case, no increase in latency.

Since registers are relatively plentiful in FPGAs, this technique is typically applied to circuits implemented with FPGAs.

See also

Resources

Related Research Articles

Processor design is a subfield of computer engineering and electronics engineering (fabrication) that deals with creating a processor, a key component of computer hardware.

<span class="mw-page-title-main">Field-programmable gate array</span> Array of logic gates that are reprogrammable

A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturing – hence the term field-programmable. The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC). Circuit diagrams were previously used to specify the configuration, but this is increasingly rare due to the advent of electronic design automation tools.

<span class="mw-page-title-main">System on a chip</span> Integrated circuit that incorporates the components of a computer

A system on a chip or system-on-chip is an integrated circuit that integrates most or all components of a computer or other electronic system. These components almost always include a central processing unit (CPU), memory interfaces, on-chip input/output devices, input/output interfaces, and secondary storage interfaces, often alongside other components such as radio modems and a graphics processing unit (GPU) – all on a single substrate or microchip. It may contain digital, analog, mixed-signal, and often radio frequency signal processing functions.

<span class="mw-page-title-main">Parallel computing</span> Programming paradigm in which many processes are executed simultaneously

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

<span class="mw-page-title-main">Front-side bus</span> Type of computer communication interface

A front-side bus (FSB) is a computer communication interface (bus) that was often used in Intel-chip-based computers during the 1990s and 2000s. The EV6 bus served the same function for competing AMD CPUs. Both typically carry data between the central processing unit (CPU) and a memory controller hub, known as the northbridge.

Reconfigurable computing is a computer architecture combining some of the flexibility of software with the high performance of hardware by processing with very flexible high speed computing fabrics like field-programmable gate arrays (FPGAs). The principal difference when compared to using ordinary microprocessors is the ability to make substantial changes to the datapath itself in addition to the control flow. On the other hand, the main difference from custom hardware, i.e. application-specific integrated circuits (ASICs) is the possibility to adapt the hardware during runtime by "loading" a new circuit on the reconfigurable fabric.

In computer engineering, instruction pipelining is a technique for implementing instruction-level parallelism within a single processor. Pipelining attempts to keep every part of the processor busy with some instruction by dividing incoming instructions into a series of sequential steps performed by different processor units with different parts of instructions processed in parallel.

<span class="mw-page-title-main">Xilinx</span> American technology company

Xilinx, Inc. was an American technology and semiconductor company that primarily supplied programmable logic devices. The company was known for inventing the first commercially viable field-programmable gate array (FPGA) and creating the first fabless manufacturing model.

Formal equivalence checking process is a part of electronic design automation (EDA), commonly used during the development of digital integrated circuits, to formally prove that two representations of a circuit design exhibit exactly the same behavior.

<span class="mw-page-title-main">Microarchitecture</span> Component of computer engineering

In computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as µarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular processor. A given ISA may be implemented with different microarchitectures; implementations may vary due to different goals of a given design or due to shifts in technology.

The MicroBlaze is a soft microprocessor core designed for Xilinx field-programmable gate arrays (FPGA). As a soft-core processor, MicroBlaze is implemented entirely in the general-purpose memory and logic fabric of Xilinx FPGAs.

<span class="mw-page-title-main">Hardware acceleration</span> Specialized computer hardware

Hardware acceleration is the use of computer hardware designed to perform specific functions more efficiently when compared to software running on a general-purpose central processing unit (CPU). Any transformation of data that can be calculated in software running on a generic CPU can also be calculated in custom-made hardware, or in some mix of both.

Tarari is a company that spun out of Intel in 2002. It has created a range of re-programmable silicon based on Xilinx Virtex-4 FPGA and ASICs that offload and accelerate really complex algorithms such as XML Parsing, scanning for Computer viruses, email spam and intruders in Intrusion-prevention systems and Unified threat management appliances. As well as inspecting content its Content Processors can also transform content and they are used for XML transformation XSLT, compression, encryption as well as HD Video encoding for WMV and VC-1 formats.

In computing, computer performance is the amount of useful work accomplished by a computer system. Outside of specific contexts, computer performance is estimated in terms of accuracy, efficiency and speed of executing computer program instructions. When it comes to high computer performance, one or more of the following factors might be involved:

In capital markets, low latency is the use of algorithmic trading to react to market events faster than the competition to increase profitability of trades. For example, when executing arbitrage strategies the opportunity to "arb" the market may only present itself for a few milliseconds before parity is achieved. To demonstrate the value that clients put on latency, in 2007 a large global investment bank has stated that every millisecond lost results in $100m per annum in lost opportunity.

Field-programmable gate array prototyping, also referred to as FPGA-based prototyping, ASIC prototyping or system-on-chip (SoC) prototyping, is the method to prototype system-on-chip and application-specific integrated circuit designs on FPGAs for hardware verification and early software development.

<span class="mw-page-title-main">Xilinx ISE</span> Hardware design tool

Xilinx ISE is a discontinued software tool from Xilinx for synthesis and analysis of HDL designs, which primarily targets development of embedded firmware for Xilinx FPGA and CPLD integrated circuit (IC) product families. It was succeeded by Xilinx Vivado. Use of the last released edition from October 2013 continues for in-system programming of legacy hardware designs containing older FPGAs and CPLDs otherwise orphaned by the replacement design tool, Vivado Design Suite.

Virtex is the flagship family of FPGA products developed by Xilinx. Other current product lines include Kintex (mid-range) and Artix (low-cost), each including configurations and models optimized for different applications. In addition, Xilinx offers the Spartan low-cost series, which continues to be updated and is nearing production utilizing the same underlying architecture and process node as the larger 7-series devices.

Latency oriented processor architecture is the microarchitecture of a microprocessor designed to serve a serial computing thread with a low latency. This is typical of most central processing units (CPU) being developed since the 1970s. These architectures, in general, aim to execute as many instructions as possible belonging to a single serial thread, in a given window of time; however, the time to execute a single instruction completely from fetch to retire stages may vary from a few cycles to even a few hundred cycles in some cases. Latency oriented processor architectures are the opposite of throughput-oriented processors which concern themselves more with the total throughput of the system, rather than the service latencies for all individual threads that they work on.