Asynchronous array of simple processors

Last updated

The asynchronous array of simple processors (AsAP) architecture comprises a 2-D array of reduced complexity programmable processors with small scratchpad memories interconnected by a reconfigurable mesh network. AsAP was developed by researchers in the VLSI Computation Laboratory (VCL) at the University of California, Davis and achieves high performance and energy-efficiency, while using a relatively small circuit area. It was made in 2006. [1]

Contents

AsAP processors are well suited for implementation in future fabrication technologies, and are clocked in a globally asynchronous locally synchronous (GALS) fashion. Individual oscillators fully halt (leakage only) in 9 cycles when there is no work to do, and restart at full speed in less than one cycle after work is available. The chip requires no crystal oscillators, phase-locked loops, delay-locked loops, global clock signal, or any global frequency or phase-related signals whatsoever.

The multi-processor architecture efficiently makes use of task-level parallelism in many complex DSP applications, and also efficiently computes many large tasks using fine-grained parallelism.

Key features

Block diagrams of a single AsAP processor and the 6x6 AsAP 1.0 chip Processor.jpg
Block diagrams of a single AsAP processor and the 6x6 AsAP 1.0 chip

AsAP uses several novel key features, of which four are:

AsAP 1 chip: 36 processors

Die photograph of the first generation 36-processor AsAP chip DiePhoto.jpg
Die photograph of the first generation 36-processor AsAP chip

A chip containing 36 (6x6) programmable processors was taped-out in May 2005 in 0.18 μm CMOS using a synthesized standard cell technology and is fully functional. Processors on the chip operate at clock rates from 520 MHz to 540 MHz at 1.8V and each processor dissipates 32 mW on average while executing applications at 475 MHz.

Most processors run at clock rates over 600 MHz at 2.0 V, which makes AsAP among the highest known clock rate fabricated processors (programmable or non-programmable) ever designed in a university; it is the second highest known in published research papers.

At 0.9 V, the average application power per processor is 2.4 mW at 116 MHz. Each processor occupies only 0.66 mm².

AsAP 2 chip: 167 processors

Die photograph of the second generation 167-processor AsAP 2 chip Asap2.diephoto.300x327.touchedup.jpg
Die photograph of the second generation 167-processor AsAP 2 chip

A second generation 65 nm CMOS design contains 167 processors with dedicated fast Fourier transform (FFT), Viterbi decoder, and video motion estimation processors; 16 KB shared memories; and long-distance inter-processor interconnect. The programmable processors can individually and dynamically change their supply voltage and clock frequency. The chip is fully functional. Processors operate up to 1.2 GHz at 1.3 V which is believed to be the highest clock rate fabricated processor designed in any university. At 1.2 V, they operate at 1.07 GHz and 47 mW when 100% active. At 0.675 V, they operate at 66 MHz and 608 μW when 100% active. This operating point enables 1 trillion MAC or arithmetic logic unit (ALU) ops/sec with a power dissipation of only 9.2 watts. Due to its MIMD architecture and fine-grain clock oscillator stalling, this energy efficiency per operation is almost perfectly constant across widely varying workloads, which is not the case for many architectures.

Applications

The coding of many DSP and general tasks for AsAP has been completed. Mapped tasks include: filters, convolutional coders, interleavers, sorting, square root, CORDIC sin/cos/arcsin/arccos, matrix multiplication, pseudo random number generators, fast Fourier transforms (FFTs) of lengths 32–1024, a complete k=7 Viterbi decoder, a JPEG encoder, a complete fully compliant baseband processor for an IEEE 802.11a/g wireless LAN transmitter and receiver, and a complete CAVLC compression block for an H.264 encoder. Blocks plug directly together with no required modifications. Power, throughput, and area results are typically many times better than existing programmable DSP processors.

The architecture enables a clean separation between programming and inter-processor timing handled entirely by hardware. A recently finished C compiler and automatic mapping tool further simplify programming.

See also

Related Research Articles

<span class="mw-page-title-main">Field-programmable gate array</span> Array of logic gates that are reprogrammable

A field-programmable gate array (FPGA) is an integrated circuit designed to be configured after manufacturing. The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC). Circuit diagrams were previously used to specify the configuration, but this is increasingly rare due to the advent of electronic design automation tools.

<span class="mw-page-title-main">StrongARM</span> Family of computer microprocessors

The StrongARM is a family of computer microprocessors developed by Digital Equipment Corporation and manufactured in the late 1990s which implemented the ARM v4 instruction set architecture. It was later acquired by Intel in 1997 from DEC's own Digital Semiconductor division as part of a settlement of a lawsuit between the two companies over patent infringement. Intel then continued to manufacture it before replacing it with the StrongARM-derived ARM-based follow-up architecture called XScale in the early 2000s.

<span class="mw-page-title-main">Motorola 56000</span> Family of digital signal processors

The Motorola DSP56000 is a family of digital signal processor (DSP) chips produced by Motorola Semiconductor starting in 1986 with later models are still being produced in the 2020s. The 56k series was quite popular for a time in a number of computers, including the NeXT, Atari Falcon030 and SGI Indigo workstations all using the 56001. Upgraded 56k versions are still used today in audio equipment, radar systems, communications devices and various other embedded DSP applications. The 56000 was also used as the basis for the updated 96000, which was not commercially successful.

XScale is a microarchitecture for central processing units initially designed by Intel implementing the ARM architecture instruction set. XScale comprises several distinct families: IXP, IXC, IOP, PXA and CE, with some later models designed as system-on-a-chip (SoC). Intel sold the PXA family to Marvell Technology Group in June 2006. Marvell then extended the brand to include processors with other microarchitectures, like Arm's Cortex.

<span class="mw-page-title-main">Digital signal processor</span> Specialized microprocessor optimized for digital signal processing

A digital signal processor (DSP) is a specialized microprocessor chip, with its architecture optimized for the operational needs of digital signal processing. DSPs are fabricated on MOS integrated circuit chips. They are widely used in audio signal processing, telecommunications, digital image processing, radar, sonar and speech recognition systems, and in common consumer electronic devices such as mobile phones, disk drives and high-definition television (HDTV) products.

JTAG is an industry standard for verifying designs and testing printed circuit boards after manufacture.

Asynchronous circuit is a sequential digital logic circuit that does not use a global clock circuit or signal generator to synchronize its components. Instead, the components are driven by a handshaking circuit which indicates a completion of a set of instructions. Handshaking works by simple data transfer protocols. Many synchronous circuits were developed in early 1950s as part of bigger asynchronous systems. Asynchronous circuits and theory surrounding is a part of several steps in integrated circuit design, a field of digital electronics engineering.

<span class="mw-page-title-main">Metastability (electronics)</span> Ability of a digital electronic system to remain in unstable equilibrium forever

In electronics, metastability is the ability of a digital electronic system to persist for an unbounded time in an unstable equilibrium or metastable state. In digital logic circuits, a digital signal is required to be within certain voltage or current limits to represent a '0' or '1' logic level for correct circuit operation; if the signal is within a forbidden intermediate range it may cause faulty behavior in logic gates the signal is applied to. In metastable states, the circuit may be unable to settle into a stable '0' or '1' logic level within the time required for proper circuit operation. As a result, the circuit can act in unpredictable ways, and may lead to a system failure, sometimes referred to as a "glitch". Metastability is an instance of the Buridan's ass paradox.

<span class="mw-page-title-main">Multi-core processor</span> Microprocessor with more than one processing unit

A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions but the single processor can run instructions on separate cores at the same time, increasing overall speed for programs that support multithreading or other parallel computing techniques. Manufacturers typically integrate the cores onto a single integrated circuit die or onto multiple dies in a single chip package. The microprocessors currently used in almost all personal computers are multi-core.

The primary focus of this article is asynchronous control in digital electronic systems. In a synchronous system, operations are coordinated by one, or more, centralized clock signals. An asynchronous system, in contrast, has no global clock. Asynchronous systems do not depend on strict arrival times of signals or messages for reliable operation. Coordination is achieved using event-driven architecture triggered by network packet arrival, changes (transitions) of signals, handshake protocols, and other methods.

<span class="mw-page-title-main">NVAX</span> CMOS microprocessor by Digital Equipment Corporation

The NVAX is a CMOS microprocessor developed and produced by Digital Equipment Corporation (DEC) that implemented the VAX instruction set architecture (ISA). A variant of the NVAX, the NVAX+, differed in the bus interface and external cache supported, but was otherwise identical in regards to microarchitecture. The NVAX+ was designed to have the same bus as the DECchip 21064, allowing drop-in replacement.

A field-programmable analog array (FPAA) is an integrated circuit device containing computational analog blocks (CAB) and interconnects between these blocks offering field-programmability. Unlike their digital cousin, the FPGA, the devices tend to be more application driven than general purpose as they may be current mode or voltage mode devices. For voltage mode devices, each block usually contains an operational amplifier in combination with programmable configuration of passive components. The blocks can, for example, act as summers or integrators.

<span class="mw-page-title-main">NEC V60</span> CISC microprocessor

The NEC V60 is a CISC microprocessor manufactured by NEC starting in 1986. Several improved versions were introduced with the same instruction set architecture (ISA), the V70 in 1987, and the V80 and AFPP in 1989. They were succeeded by the V800 product families, which is currently produced by Renesas Electronics.

An application-specific instruction set processor (ASIP) is a component used in system on a chip design. The instruction set architecture of an ASIP is tailored to benefit a specific application. This specialization of the core provides a tradeoff between the flexibility of a general purpose central processing unit (CPU) and the performance of an application-specific integrated circuit (ASIC).

Ambric, Inc. was a designer of computer processors that developed the Ambric architecture. Its Am2045 Massively Parallel Processor Array (MPPA) chips were primarily used in high-performance embedded systems such as medical imaging, video, and signal-processing.

A massively parallel processor array, also known as a multi purpose processor array (MPPA) is a type of integrated circuit which has a massively parallel array of hundreds or thousands of CPUs and RAM memories. These processors pass work to one another through a reconfigurable interconnect of channels. By harnessing a large number of processors working in parallel, an MPPA chip can accomplish more demanding tasks than conventional chips. MPPAs are based on a software parallel programming model for developing high-performance embedded system applications.

<span class="mw-page-title-main">Alpha 21164</span> Microprocessor

The Alpha 21164, also known by its code name, EV5, is a microprocessor developed and fabricated by Digital Equipment Corporation that implemented the Alpha instruction set architecture (ISA). It was introduced in January 1995, succeeding the Alpha 21064A as Digital's flagship microprocessor. It was succeeded by the Alpha 21264 in 1998.

<span class="mw-page-title-main">Alpha 21264</span> RISC microprocessor

The Alpha 21264 is a Digital Equipment Corporation RISC microprocessor launched on 19 October 1998. The 21264 implemented the Alpha instruction set architecture (ISA).

Manycore processors are special kinds of multi-core processors designed for a high degree of parallel processing, containing numerous simpler, independent processor cores. Manycore processors are used extensively in embedded computers and high-performance computing.

<span class="mw-page-title-main">SHAKTI (microprocessor)</span> Technology project funded by the Government of India

SHAKTI is an open-source initiative by the Reconfigurable Intelligent Systems Engineering (RISE) group at Indian Institute of Technology, Madras to develop the first indigenous Indian industrial-grade processor. The aim of SHAKTI initiative includes building an opensource production-grade processor, complete system on chips (SoCs), development boards and SHAKTI based software platform. The primary focus of the team is architecture research to develop SoCs, which is competitive with commercial offerings in the market concerning area, power and performance. All the source codes for SHAKTI are open-sourced under the Modified BSD License. The project was funded by the Ministry of Electronics and Information Technology (MeITY), Government of India.

References

  1. Yu, Zhiyi; Meeuwsen, Michael J.; Apperson, Ryan W.; Sattari, Omar; Lai, Michael; Webb, Jeremy W.; Work, Eric W.; Truong, Dean; Mohsenin, Tinoosh; Baas, Bevan M. (March 2008). "AsAP: An Asynchronous Array of Simple Processors". IEEE Journal of Solid-State Circuits. 43 (3): 695–705. Bibcode:2008IJSSC..43..695Y. doi:10.1109/JSSC.2007.916616. ISSN   0018-9200. S2CID   14523656.