Ambric

Last updated

Ambric, Inc. was a designer of computer processors that developed the Ambric architecture. Its Am2045 Massively Parallel Processor Array (MPPA) chips were primarily used in high-performance embedded systems such as medical imaging, video, and signal-processing.

Contents

Ambric was founded in 2003 in Beaverton, Oregon by Jay Eisenlohr and Anthony Mark Jones. Eisenlohr previously founded and sold Rendition, Inc. to Micron Technology [ dead link ] for $93M, while Jones is a leading expert in analog, digital, and system IC design and is the named inventor on over 120 U.S. patents. Jones was also the founder of a number of companies prior to Ambric, and has since co-founded Vitek IP with technology and patent expert Dan Buri in 2019. Ambric developed and introduced the Am2045 and its software tools in 2007, but fell victim to the financial crisis of 2007–2008. Ambric's Am2045 and tools remained available through Nethra Imaging, Inc., [1] which closed in 2012.

Architecture and programming model

Ambric architecture is a massively parallel distributed memory multiprocessor, based on the Structural Object Programming Model. [2] [3] Each processor is programmed in conventional Java (a strict subset) and/or assembly code. The hundreds of processors on the chip send data and control messages to one another through an interconnect of reconfigurable, self-synchronizing channels, which provide both communication and synchronization. [4] The model of computation is very similar to a Kahn process network with bounded buffers.

Devices and tools

The Am2045 device has 336 32-bit RISC-DSP fixed-point processors and 336 2-kibibyte memories, which run at up to 300 MHz. [5] It has an Eclipse-based integrated development environment including editor, compiler, assemblers, simulator, configuration generator, source-code debugger and video/image-processing, signal-processing, and video-codec libraries.

Power and performance

The Am2045 delivers 1 TeraOPS (Operations Per Second) [6] and 50 Giga-MACs (Multply-Accumulates per second) of fixed-point processing with 6-12W of power consumed (dependent on the application).

Applications

Ambric's MPPA devices were used for high-definition, 2K and 4K video compression, transcoding and analysis, image recognition, medical imaging, signal-processing, software defined radio and other compute-intensive streaming media applications, [7] which otherwise would use FPGA, DSP and/or ASIC chips. The company claimed advantages such as higher performance and energy efficiency, scalability, higher productivity due to software programming rather than hardware design, and off-the-shelf availability.

Video codec libraries were available for a variety of professional camera and video editing formats such as DVCPRO HD, VC-3 (DNxHD), AVC-Intra and others.

An X-Ray customer system employs over 13,000 cores contained in 40 Am2045 chips, doing 3D reconstruction, in under 500W, in a single ATCA chassis. [8]

Other MPPAs include picoChip and IntellaSys, and the UC Davis's AsAP research chip. Companies that offer or offered products classified as manycore (a related classification) devices include Aspex Semiconductor, Cavium, ClearSpeed, Coherent Logix, SPI, and Tilera. The more established processor companies, Texas Instruments and Freescale, offer multicore products, but with a lower number of processors (typically 3–8) and use traditional shared-memory, timing-sensitive programming models.

Recognition

Microprocessor Report gave a 2006 MPR Analysts' Choice Award for Innovation for the Ambric-architecture "for the design concept and architecture of its massively parallel processor, the Am2045". [9]

In 2013, Ambric architecture received the Top 20 award from the IEEE International Symposium on Field-Programmable Custom Computing Machines, recognizing it as one of the 20 most significant publications in the 20-year history of the conference. [10]

Related Research Articles

Processor design is a subfield of computer science and computer engineering (fabrication) that deals with creating a processor, a key component of computer hardware.

<span class="mw-page-title-main">Parallel computing</span> Programming paradigm in which many processes are executed simultaneously

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

Reconfigurable computing is a computer architecture combining some of the flexibility of software with the high performance of hardware by processing with flexible hardware platforms like field-programmable gate arrays (FPGAs). The principal difference when compared to using ordinary microprocessors is the ability to add custom computational blocks using FPGAs. On the other hand, the main difference from custom hardware, i.e. application-specific integrated circuits (ASICs) is the possibility to adapt the hardware during runtime by "loading" a new circuit on the reconfigurable fabric, thus providing new computational blocks without the need to manufacture and add new chips to the existing system.

<span class="mw-page-title-main">Steve Furber</span> British computer scientist

Stephen Byram Furber is a British computer scientist, mathematician and hardware engineer, and Emeritus ICL Professor of Computer Engineering in the Department of Computer Science at the University of Manchester, UK. After completing his education at the University of Cambridge, he spent the 1980s at Acorn Computers, where he was a principal designer of the BBC Micro and the ARM 32-bit RISC microprocessor. As of 2023, over 250 billion ARM chips have been manufactured, powering much of the world's mobile computing and embedded systems, everything from sensors to smartphones to servers.

<span class="mw-page-title-main">Hardware acceleration</span> Specialized computer hardware

Hardware acceleration is the use of computer hardware designed to perform specific functions more efficiently when compared to software running on a general-purpose central processing unit (CPU). Any transformation of data that can be calculated in software running on a generic CPU can also be calculated in custom-made hardware, or in some mix of both.

<span class="mw-page-title-main">Multi-core processor</span> Microprocessor with more than one processing unit

A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions but the single processor can run instructions on separate cores at the same time, increasing overall speed for programs that support multithreading or other parallel computing techniques. Manufacturers typically integrate the cores onto a single integrated circuit die or onto multiple dies in a single chip package. The microprocessors currently used in almost all personal computers are multi-core.

A soft microprocessor is a microprocessor core that can be wholly implemented using logic synthesis. It can be implemented via different semiconductor devices containing programmable logic, including both high-end and commodity variations.

The transistor count is the number of transistors in an electronic device. It is the most common measure of integrated circuit complexity. The rate at which MOS transistor counts have increased generally follows Moore's law, which observes that transistor count doubles approximately every two years. However, being directly proportional to the area of a chip, transistor count does not represent how advanced the corresponding manufacturing technology is: a better indication of this is transistor density.

Computational RAM (C-RAM) is random-access memory with processing elements integrated on the same chip. This enables C-RAM to be used as a SIMD computer. It also can be used to more efficiently use memory bandwidth within a memory chip. The general technique of doing computations in memory is called Processing-In-Memory (PIM).

This is a glossary of terms used in the field of Reconfigurable computing and reconfigurable computing systems, as opposed to the traditional Von Neumann architecture.

A massively parallel processor array, also known as a multi purpose processor array (MPPA) is a type of integrated circuit which has a massively parallel array of hundreds or thousands of CPUs and RAM memories. These processors pass work to one another through a reconfigurable interconnect of channels. By harnessing a large number of processors working in parallel, an MPPA chip can accomplish more demanding tasks than conventional chips. MPPAs are based on a software parallel programming model for developing high-performance embedded system applications.

Pollack's Rule states that microprocessor "performance increase due to microarchitecture advances is roughly proportional to [the] square root of [the] increase in complexity". This contrasts with power consumption increase, which is roughly linearly proportional to the increase in complexity. Complexity in this context means processor logic, i.e. its area.

Manycore processors are special kinds of multi-core processors designed for a high degree of parallel processing, containing numerous simpler, independent processor cores. Manycore processors are used extensively in embedded computers and high-performance computing.

Massively parallel is the term for using a large number of computer processors to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of threads.

<span class="mw-page-title-main">Tabula, Inc.</span>

Tabula, Inc., was an American fabless semiconductor company based in Santa Clara, California. Founded in 2003 by Steve Teig, it raised $215 million in venture funding. The company designed and built three dimensional field programmable gate arrays and ranked third on the Wall Street Journal's annual "Next Big Thing" list in 2012.

Heterogeneous computing refers to systems that use more than one kind of processor or core. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorporating specialized processing capabilities to handle particular tasks.

A vision processing unit (VPU) is an emerging class of microprocessor; it is a specific type of AI accelerator, designed to accelerate machine vision tasks.

An AI accelerator, deep learning processor, or neural processing unit (NPU) is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications include algorithms for robotics, Internet of Things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2024, a typical AI integrated circuit chip contains tens of billions of MOSFETs.

<span class="mw-page-title-main">Nader Bagherzadeh</span>

Nader Bagherzadeh is a professor of computer engineering in the Department of Electrical Engineering and Computer Science at the University of California, Irvine, where he served as a chair from 1998 to 2003. Bagherzadeh has been involved in research and development in the areas of: Computer Architecture, Reconfigurable Computing, VLSI Chip Design, Network-on-Chip, 3D chips, Sensor Networks, Computer Graphics, Memory and Embedded Systems. Bagherzadeh was named Fellow of the Institute of Electrical and Electronics Engineers (IEEE) in 2014 for contributions to the design and analysis of coarse-grained reconfigurable processor architectures. Bagherzadeh has published more than 400 articles in peer-reviewed journals and conferences. He was with AT&T Bell Labs from 1980 to 1984.

A domain-specific architecture (DSA) is a programmable computer architecture specifically tailored to operate very efficiently within the confines of a given application domain. The term is often used in contrast to general-purpose architectures, such as CPUs, that are designed to operate on any computer program.

References

  1. "Ambric Lives On, in a Parallel Universe". edn.com. 2011-06-29. Retrieved 2024-05-08.
  2. Mike Butts, Anthony Mark Jones, Paul Wasson, "A Structural Object Programming Model, Architecture, Chip and Tools for Reconfigurable Computing", Proceedings of FCCM, April 2007, IEEE Computer Society
  3. Anthony Mark Jones, Mike Butts. "TeraOPS Hardware: A New Massively-Parallel MIMD Computing Fabric IC", IEEE Hot Chips Symposium, August 2006, IEEE Computer Society
  4. Mike Butts, "Synchronization through Communication in a Massively Parallel Processor Array", IEEE Micro, vol. 27, no. 5, pp. 32-40, September/October 2007, IEEE Computer Society
  5. "Design and Programming of the KiloCore Processor Arrays" (PDF). ucdavis.edu. 2020. Retrieved 2024-05-12.
  6. "Multimode sensor processing using Massively Parallel Processor Arrays (MPPAs)". design-reuse.com. 2008-03-18. Retrieved 2024-05-12.
  7. "Ambric Now Delivering the Am2045B, a New, Higher Performance, Lower Power Version of its Industry-leading TeraOPS-class MPPA Device". edn.com. 2007-11-15. Retrieved 2024-05-12.
  8. FPGA Gurus, EDN, "Ambric Lives On in a Parallel Universe", June 29. 2011,
  9. Microprocessor Report Announces First Group of Winners for the Eighth Annual MPR Analysts' Choice Awards, February 20, 2007, Archived 2007-10-31 at the Wayback Machine
  10. FCCM20 Endorsement of "A Structural Object Programming Model, Architecture, Chip and Tools for Reconfigurable Computing", April 2013.

Further reading