Ambric

Last updated

Ambric, Inc. was a designer of computer processors that developed the Ambric architecture. Its Am2045 Massively Parallel Processor Array (MPPA) chips were primarily used in high-performance embedded systems such as medical imaging, video, and signal-processing.

Contents

Ambric was founded in 2003 in Beaverton, Oregon by Jay Eisenlohr and Anthony Mark Jones. Eisenlohr previously founded and sold Rendition, Inc. to Micron Technology [ dead link ] for $93M, while Jones is a leading expert in analog, digital, and system IC design and is the named inventor on over 120 U.S. patents. Jones was also the founder of a number of companies prior to Ambric, and has since co-founded Vitek IP with technology and patent expert Dan Buri in 2019. Ambric developed and introduced the Am2045 and its software tools in 2007, but fell victim to the financial crisis of 2007–2008. Ambric's Am2045 and tools remained available through Nethra Imaging, Inc., which closed in 2012.

Architecture and programming model

Ambric architecture is a massively parallel distributed memory multiprocessor, based on the Structural Object Programming Model. [1] [2] Each processor is programmed in conventional Java (a strict subset) and/or assembly code. The hundreds of processors on the chip send data and control messages to one another through an interconnect of reconfigurable, self-synchronizing channels, which provide both communication and synchronization. [3] The model of computation is very similar to a Kahn process network with bounded buffers.

Devices and tools

The Am2045 device has 336 32-bit RISC-DSP fixed-point processors and 336 2-kibibyte memories, which run at up to 300 MHz. It has an Eclipse-based integrated development environment including editor, compiler, assemblers, simulator, configuration generator, source-code debugger and video/image-processing, signal-processing, and video-codec libraries.

Power and performance

The Am2045 delivers 1 TeraOPS (Operations Per Second) and 50 Giga-MACs (Multply-Accumulates per second) of fixed-point processing with 6-12W of power consumed (dependent on the application).

Applications

Ambric's MPPA devices were used for high-definition, 2K and 4K video compression, transcoding and analysis, image recognition, medical imaging, signal-processing, software defined radio and other compute-intensive streaming media applications, which otherwise would use FPGA, DSP and/or ASIC chips. The company claimed advantages such as higher performance and energy efficiency, scalability, higher productivity due to software programming rather than hardware design, and off-the-shelf availability.

Video codec libraries were available for a variety of professional camera and video editing formats such as DVCPRO HD, VC-3 (DNxHD), AVC-Intra and others.

An X-Ray customer system employs over 13,000 cores contained in 40 Am2045 chips, doing 3D reconstruction, in under 500W, in a single ATCA chassis. [4]

Other MPPAs include picoChip and IntellaSys, and the UC Davis's AsAP research chip. Companies that offer or offered products classified as manycore (a related classification) devices include Aspex Semiconductor, Cavium, ClearSpeed, Coherent Logix, SPI, and Tilera. The more established processor companies, Texas Instruments and Freescale, offer multicore products, but with a lower number of processors (typically 3–8) and use traditional shared-memory, timing-sensitive programming models.

Recognition

Microprocessor Report gave a 2006 MPR Analysts' Choice Award for Innovation for the Ambric-architecture "for the design concept and architecture of its massively parallel processor, the Am2045". [5]

In 2013, Ambric architecture received the Top 20 award from the IEEE International Symposium on Field-Programmable Custom Computing Machines, recognizing it as one of the 20 most significant publications in the 20-year history of the conference. [6]

Related Research Articles

<span class="mw-page-title-main">Parallel computing</span> Programming paradigm in which many processes are executed simultaneously

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

Reconfigurable computing is a computer architecture combining some of the flexibility of software with the high performance of hardware by processing with very flexible high speed computing fabrics like field-programmable gate arrays (FPGAs). The principal difference when compared to using ordinary microprocessors is the ability to make substantial changes to the datapath itself in addition to the control flow. On the other hand, the main difference from custom hardware, i.e. application-specific integrated circuits (ASICs) is the possibility to adapt the hardware during runtime by "loading" a new circuit on the reconfigurable fabric.

Granularity, the condition of existing in granules or grains, refers to the extent to which a material or system is composed of distinguishable pieces. It can either refer to the extent to which a larger entity is subdivided, or the extent to which groups of smaller indistinguishable entities have joined together to become larger distinguishable entities.

<span class="mw-page-title-main">Hardware acceleration</span> Specialized computer hardware

Hardware acceleration is the use of computer hardware designed to perform specific functions more efficiently when compared to software running on a general-purpose central processing unit (CPU). Any transformation of data that can be calculated in software running on a generic CPU can also be calculated in custom-made hardware, or in some mix of both.

<span class="mw-page-title-main">Kahn process networks</span> Model of computation

A Kahn process network is a distributed model of computation in which a group of deterministic sequential processes communicate through unbounded first in, first out channels. The model requires that reading from a channel is blocking while writing is non-blocking. Due to these key restrictions, the resulting process network exhibits deterministic behavior that does not depend on the timing of computation nor on communication delays.

<span class="mw-page-title-main">Multi-core processor</span> Microprocessor with more than one processing unit

A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions but the single processor can run instructions on separate cores at the same time, increasing overall speed for programs that support multithreading or other parallel computing techniques. Manufacturers typically integrate the cores onto a single integrated circuit die or onto multiple dies in a single chip package. The microprocessors currently used in almost all personal computers are multi-core.

A soft microprocessor is a microprocessor core that can be wholly implemented using logic synthesis. It can be implemented via different semiconductor devices containing programmable logic, including both high-end and commodity variations.

FpgaC is a compiler for a subset of the C programming language, which produces digital circuits that will execute the compiled programs. The circuits may use FPGAs or CPLDs as the target processor for reconfigurable computing, or even ASICs for dedicated applications. FpgaC's goal is to be an efficient High Level Language (HLL) for reconfigurable computing, rather than a Hardware Description Language (HDL) for building efficient custom hardware circuits.

Computational RAM (C-RAM) is random-access memory with processing elements integrated on the same chip. This enables C-RAM to be used as a SIMD computer. It also can be used to more efficiently use memory bandwidth within a memory chip.

Mitrionics was a Swedish company manufacturing softcore reconfigurable processors. It has been mentioned as one of EETimes "60 Emerging startups". The company was founded in 2001 by Stefan Möhl and Pontus Borg to commercialize a massively parallel reconfigurable processor implemented on FPGAs. It can be described as turning general purpose chips into massive parallel processors that can be used for high performance computing. Mitrionics massively parallel processor is available on Cray, Nallatech, and Silicon Graphics systems.

A massively parallel processor array, also known as a multi purpose processor array (MPPA) is a type of integrated circuit which has a massively parallel array of hundreds or thousands of CPUs and RAM memories. These processors pass work to one another through a reconfigurable interconnect of channels. By harnessing a large number of processors working in parallel, an MPPA chip can accomplish more demanding tasks than conventional chips. MPPAs are based on a software parallel programming model for developing high-performance embedded system applications.

<span class="mw-page-title-main">Verilator</span>

Verilator is a free and open-source software tool which converts Verilog to a cycle-accurate behavioral model in C++ or SystemC. The generated models are cycle-accurate and 2-state; as a consequence, the models typically offer higher performance than the more widely used event-driven simulators, which can model behavior within the clock cycle. Verilator is now used within academic research, open source projects and for commercial semiconductor development. It is part of the growing body of free EDA software.

Manycore processors are special kinds of multi-core processors designed for a high degree of parallel processing, containing numerous simpler, independent processor cores. Manycore processors are used extensively in embedded computers and high-performance computing.

Computing with Memory refers to computing platforms where function response is stored in memory array, either one or two-dimensional, in the form of lookup tables (LUTs) and functions are evaluated by retrieving the values from the LUTs. These computing platforms can follow either a purely spatial computing model, as in field-programmable gate array (FPGA), or a temporal computing model, where a function is evaluated across multiple clock cycles. The latter approach aims at reducing the overhead of programmable interconnect in FPGA by folding interconnect resources inside a computing element. It uses dense two-dimensional memory arrays to store large multiple-input multiple-output LUTs. Computing with Memory differs from Computing in Memory or processor-in-memory (PIM) concepts, widely investigated in the context of integrating a processor and memory on the same chip to reduce memory latency and increase bandwidth. These architectures seek to reduce the distance the data travels between the processor and the memory. The Berkeley IRAM project is one notable contribution in the area of PIM architectures.

Massively parallel is the term for using a large number of computer processors to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of threads.

Heterogeneous computing refers to systems that use more than one kind of processor or core. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorporating specialized processing capabilities to handle particular tasks.

A vision processing unit (VPU) is an emerging class of microprocessor; it is a specific type of AI accelerator, designed to accelerate machine vision tasks.

An AI accelerator is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications include algorithms for robotics, Internet of Things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2018, a typical AI integrated circuit chip contains billions of MOSFET transistors. A number of vendor-specific terms exist for devices in this category, and it is an emerging technology without a dominant design.

<span class="mw-page-title-main">Nader Bagherzadeh</span>

Nader Bagherzadeh is a professor of Computer Engineering in the Department of Electrical Engineering and Computer Science at the University of California, Irvine, where he served as a chair from 1998 to 2003. Bagherzadeh has been involved in research and development in the areas of: Computer Architecture, Reconfigurable Computing, VLSI Chip Design, Network-on-Chip, 3D chips, Sensor Networks, Computer Graphics, Memory and Embedded Systems. Bagherzadeh was named Fellow of the Institute of Electrical and Electronics Engineers (IEEE) in 2014 for contributions to the design and analysis of coarse-grained reconfigurable processor architectures. Bagherzadeh has published more than 400 articles in peer-reviewed journals and conferences. He was with AT&T Bell Labs from 1980 to 1984.

<span class="mw-page-title-main">Lesley Shannon</span>

Lesley Shannon is a Canadian professor who is Chair for the Computer Engineering Option in the School of Engineering Science at Simon Fraser University. She is also the current NSERC Chair for Women in Science and Engineering for BC and Yukon. Shannon’s chair operates the Westcoast Women in Engineering, Science and Technology (WWEST) program to promote equity, diversity and inclusion in STEM.

References

  1. Mike Butts, Anthony Mark Jones, Paul Wasson, "A Structural Object Programming Model, Architecture, Chip and Tools for Reconfigurable Computing", Proceedings of FCCM, April 2007, IEEE Computer Society
  2. Anthony Mark Jones, Mike Butts. "TeraOPS Hardware: A New Massively-Parallel MIMD Computing Fabric IC", IEEE Hot Chips Symposium, August 2006, IEEE Computer Society
  3. Mike Butts, "Synchronization through Communication in a Massively Parallel Processor Array", IEEE Micro, vol. 27, no. 5, pp. 32-40, September/October 2007, IEEE Computer Society
  4. FPGA Gurus, EDN, "Ambric Lives On in a Parallel Universe", June 29. 2011,
  5. Microprocessor Report Announces First Group of Winners for the Eighth Annual MPR Analysts' Choice Awards, February 20, 2007, Archived 2007-10-31 at the Wayback Machine
  6. FCCM20 Endorsement of "A Structural Object Programming Model, Architecture, Chip and Tools for Reconfigurable Computing", April 2013.

Further reading