Network on a chip

Last updated

A network on a chip or network-on-chip (NoC /ˌɛnˌˈs/ en-oh-SEE or /nɒk/ knock) [nb 1] is a network-based communications subsystem on an integrated circuit ("microchip"), most typically between modules in a system on a chip (SoC). The modules on the IC are typically semiconductor IP cores schematizing various functions of the computer system, and are designed to be modular in the sense of network science. The network on chip is a router-based packet switching network between SoC modules.

Network theory study of graphs as a representation of either symmetric relations or, more generally, of asymmetric relations between discrete objects

Network theory is the study of graphs as a representation of either symmetric relations or asymmetric relations between discrete objects. In computer science and network science, network theory is a part of graph theory: a network can be defined as a graph in which nodes and/or edges have attributes.

In telecommunication, a communications system or communication system is a collection of individual communications networks, transmission systems, relay stations, tributary stations, and data terminal equipment (DTE) usually capable of interconnection and interoperation to form an integrated whole. The components of a communications system serve a common purpose, are technically compatible, use common procedures, respond to controls, and operate in union.

Contents

NoC technology applies the theory and methods of computer networking to on-chip communication and brings notable improvements over conventional bus and crossbar communication architectures. Networks-on-chip come in many network topologies, many of which are still experimental as of 2018.

Computer network collection of autonomous computers interconnected by a single technology

A computer network is a digital telecommunications network which allows nodes to share resources. In computer networks, computing devices exchange data with each other using connections between nodes. These data links are established over cable media such as wires or optic cables, or wireless media such as Wi-Fi.

Bus (computing) communication system that transfers data between components inside a computer

In computer architecture, a bus is a communication system that transfers data between components inside a computer, or between computers. This expression covers all related hardware components and software, including communication protocols.

In electronics, a crossbar switch is a collection of switches arranged in a matrix configuration. A crossbar switch has multiple input and output lines that form a crossed pattern of interconnecting lines between which a connection may be established by closing a switch located at each intersection, the elements of the matrix. Originally, a crossbar switch consisted literally of crossing metal bars that provided the input and output paths. Later implementations achieved the same switching topology in solid state semiconductor chips. The cross-point switch is one of the principal switch architectures, together with a rotary switch, memory switch, and a crossover switch.

Networks-on-chip improve the scalability of systems-on-chip and the power efficiency of complex SoCs compared to other communication subsystem designs. A very common NoC used in contemporary personal computers is a graphics processing unit (GPU), which is commonly used in computer graphics, video gaming and accelerating artificial intelligence. They are an emerging technology, with projections for large growth in the near future as manycore computer architectures become more common.

Scalability

Scalability is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged to accommodate that growth. For example, a system is considered scalable if it is capable of increasing its total output under an increased load when resources are added. An analogous meaning is implied when the word is used in an economic context, where a company's scalability implies that the underlying business model offers the potential for economic growth within the company.

Personal computer Computer intended for use by an individual person

A personal computer (PC) is a multi-purpose computer whose size, capabilities, and price make it feasible for individual use. Personal computers are intended to be operated directly by an end user, rather than by a computer expert or technician. Unlike large costly minicomputer and mainframes, time-sharing by many people at the same time is not used with personal computers.

Graphics processing unit specialized electronic circuit; graphics accelerator

A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and game consoles. Modern GPUs are very efficient at manipulating computer graphics and image processing. Their highly parallel structure makes them more efficient than general-purpose CPUs for algorithms that process large blocks of data in parallel. In a personal computer, a GPU can be present on a video card or embedded on the motherboard. In certain CPUs, they are embedded on the CPU die.

Structure

Networks-on-chip can span synchronous and asynchronous clock domains, known as clock domain crossing, or use unclocked asynchronous logic. NoCs support globally asynchronous, locally synchronous electronics architectures, allowing each processor core or functional unit on the System-on-Chip to have its own clock domain. [1]

In digital electronic design a clock domain crossing (CDC), or simply clock crossing, is the traversal of a signal in a synchronous digital circuit from one clock domain into another. If a signal does not assert long enough and is not registered, it may appear asynchronous on the incoming clock boundary.

An asynchronous circuit, or self-timed circuit, is a sequential digital logic circuit which is not governed by a clock circuit or global clock signal. Instead it often uses signals that indicate completion of instructions and operations, specified by simple data transfer protocols. This type of circuit is contrasted with synchronous circuits, in which changes to the signal values in the circuit are triggered by repetitive pulses called a clock signal. Most digital devices today use synchronous circuits. However asynchronous circuits have the potential to be faster, and may also have advantages in lower power consumption, lower electromagnetic interference, and better modularity in large systems. Asynchronous circuits are an active area of research in digital logic design.

Globally asynchronous locally synchronous (GALS) is an architecture for designing electronic circuits which addresses the problem of safe and reliable data transfer between independent clock domains. GALS is a Model of Computation (MoC) that emerged in the 1980s. It allows to design computer systems consisting of several synchronous islands interacting with other islands using asynchronous communication, e.g. with FIFOs.

Architectures

NoC architectures typically model sparse small-world networks (SWNs) and scale-free networks (SFNs) to limit the number, length, area and power consumption of interconnection wires and point-to-point connections.

Small-world network

A small-world network is a type of mathematical graph in which most nodes are not neighbors of one another, but the neighbors of any given node are likely to be neighbors of each other and most nodes can be reached from every other node by a small number of hops or steps. Specifically, a small-world network is defined to be a network where the typical distance L between two randomly chosen nodes grows proportionally to the logarithm of the number of nodes N in the network, that is:

Scale-free network network whose degree distribution follows a power law

A scale-free network is a network whose degree distribution follows a power law, at least asymptotically. That is, the fraction P(k) of nodes in the network having k connections to other nodes goes for large values of k as

In telecommunications, a point-to-point connection refers to a communications connection between two communication endpoints or nodes. An example is a telephone call, in which one telephone is connected with one other, and what is said by one caller can only be heard by the other. This is contrasted with a point-to-multipoint or broadcast connection, in which many nodes can receive information transmitted by one node. Other examples of point-to-point communications links are leased lines, microwave radio relay and two-way radio.

Benefits

Traditionally, ICs have been designed with dedicated point-to-point connections, with one wire dedicated to each signal. This results in a dense network topology. For large designs, in particular, this has several limitations from a physical design viewpoint. It requires power quadratic in the number of interconnections. The wires occupy much of the area of the chip, and in nanometer CMOS technology, interconnects dominate both performance and dynamic power dissipation, as signal propagation in wires across the chip requires multiple clock cycles. This also allows more parasitic capacitance, resistance and inductance to accrue on the circuit. (See Rent's rule for a discussion of wiring requirements for point-to-point connections).

Integrated circuit design Engineering process for electronic hardware

Integrated circuit design, or IC design, is a subset of electronics engineering, encompassing the particular logic and circuit design techniques required to design integrated circuits, or ICs. ICs consist of miniaturized electronic components built into an electrical network on a monolithic semiconductor substrate by photolithography.

Quadratic function polynomial function in which the highest-degree term is of the second degree

In algebra, a quadratic function, a quadratic polynomial, a polynomial of degree 2, or simply a quadratic, is a polynomial function with one or more variables in which the highest-degree term is of the second degree. For example, a quadratic function in three variables x, y, and z contains exclusively terms x2, y2, z2, xy, xz, yz, x, y, z, and a constant:

Die (integrated circuit) an unpackaged integrated circuit

A die, in the context of integrated circuits, is a small block of semiconducting material on which a given functional circuit is fabricated. Typically, integrated circuits are produced in large batches on a single wafer of electronic-grade silicon (EGS) or other semiconductor through processes such as photolithography. The wafer is cut (diced) into many pieces, each containing one copy of the circuit. Each of these pieces is called a die.

Sparsity and locality of interconnections in the communications subsystem yield several improvements over traditional bus-based and crossbar-based systems.

Parallelism and scalability

The wires in the links of the network-on-chip are shared by many signals. A high level of parallelism is achieved, because all data links in the NoC can operate simultaneously on different data packets.[ why? ] Therefore, as the complexity of integrated systems keeps growing, a NoC provides enhanced performance (such as throughput) and scalability in comparison with previous communication architectures (e.g., dedicated point-to-point signal wires, shared buses, or segmented buses with bridges). Of course, the algorithms [ which? ] must be designed in such a way that they offer large parallelism and can hence utilize the potential of NoC.

Current research

Some researchers[ who? ] think that NoCs need to support quality of service (QoS), namely achieve the various requirements in terms of throughput, end-to-end delays, fairness, [2] and deadlines.[ citation needed ] Real-time computation, including audio and video playback, is one reason for providing QoS support. However, current system implementations like VxWorks, RTLinux or QNX are able to achieve sub-millisecond real-time computing without special hardware.[ citation needed ]

This may indicate that for many real-time applications the service quality of existing on-chip interconnect infrastructure is sufficient, and dedicated hardware logic would be necessary to achieve microsecond precision, a degree that is rarely needed in practice for end users (sound or video jitter need only tenth of milliseconds latency guarantee). Another motivation for NoC-level quality of service (QoS) is to support multiple concurrent users sharing resources of a single chip multiprocessor in a public cloud computing infrastructure. In such instances, hardware QoS logic enables the service provider to make contractual guarantees on the level of service that a user receives, a feature that may be deemed desirable by some corporate or government clients.[ citation needed ]

Many challenging research problems remain to be solved at all levels, from the physical link level through the network level, and all the way up to the system architecture and application software. The first dedicated research symposium on networks on chip was held at Princeton University, in May 2007. [3] The second IEEE International Symposium on Networks-on-Chip was held in April 2008 at Newcastle University.

Research has been conducted on integrated optical waveguides and devices comprising an optical network on a chip (ONoC). [4] [5]

Side benefits of NoC: Cache miss pattern prediction and data forwarding leveraging augmented switches

In a multi-core system, connected by NoC, coherency messages and cache miss requests have to pass switches. Accordingly, switches can be augmented with simple tracking and forwarding elements to detect which cache blocks will be requested in the future by which cores. Then, the forwarding elements multicast any requested block to all the cores that may request the block in the future. This mechanism reduces cache miss rate . [6]

Benchmarks

NoC development and studies require comparing different proposals and options. NoC traffic patterns are under development to help such evaluations. Existing NoC benchmarks include NoCBench and MCSL NoC Traffic Patterns. [7]

Interconnect processing unit

An interconnect processing unit (IPU) [8] is a on-chip communication network with hardware and software components which jointly implement key functions of different system-on-chip programming models through a set of communication and synchronization primitives and provide low-level platform services to enable advanced features[ which? ] in modern heterogeneous applications[ definition needed ] on a single die.

Commercial providers

See also

Notes

  1. This article uses the convention that "NoC" is pronounced /nɒk/ nock. Therefore, it uses the convention "a" for the indefinite article corresponding to NoC ("a NoC"). Other sources may pronounce it as /ˌɛnˌˈs/ en-oh-SEE and therefore use "an NoC".

Related Research Articles

Static random-access memory Semiconductor memory

Static random-access memory is a type of semiconductor memory that uses bistable latching circuitry (flip-flop) to store each bit. SRAM exhibits data remanence, but it is still volatile in the conventional sense that data is eventually lost when the memory is not powered.

System on a chip type of integrated circuit

A system on a chip or system on chip is an integrated circuit that integrates all components of a computer or other electronic system. These components typically include a central processing unit (CPU), memory, input/output ports and secondary storage – all on a single substrate. It may contain digital, analog, mixed-signal, and often radio frequency signal processing functions, depending on the application. As they are integrated on a single electronic substrate, SoCs consume much less power and take up much less area than multi-chip designs with equivalent functionality. Because of this, SoCs are very common in the mobile computing and edge computing markets. Systems on chip are commonly used in embedded systems and the Internet of Things.

MIMD class of parallel computer architecture in Flynns taxonomy, in which multiple operations are performed on multiple data points simultaneously

In computing, MIMD is a technique employed to achieve parallelism. Machines using MIMD have a number of processors that function asynchronously and independently. At any time, different processors may be executing different instructions on different pieces of data. MIMD architectures may be used in a number of application areas such as computer-aided design/computer-aided manufacturing, simulation, modeling, and as communication switches. MIMD machines can be of either shared memory or distributed memory categories. These classifications are based on how MIMD processors access memory. Shared memory machines may be of the bus-based, extended, or hierarchical type. Distributed memory machines may have hypercube or mesh interconnection schemes.

In electronics and especially synchronous digital circuits, a clock signal is a particular type of signal that oscillates between a high and a low state and is used like a metronome to coordinate actions of digital circuits.

Front-side bus computer communication interface (bus) often used in Intel-chip-based computers during the 1990s and 2000s; replaced by replaced by HyperTransport, Intel QuickPath Interconnect or Direct Media Interface in modern CPUs

A front-side bus (FSB) is a computer communication interface (bus) that was often used in Intel-chip-based computers during the 1990s and 2000s. The competing EV6 bus served the same function for AMD CPUs. Both typically carry data between the central processing unit (CPU) and a memory controller hub, known as the northbridge.

Robert Drost American computer scientist

Robert Drost is an American computer scientist. He was born in 1970 in New York City.

The ARM Advanced Microcontroller Bus Architecture (AMBA) is an open-standard, on-chip interconnect specification for the connection and management of functional blocks in system-on-a-chip (SoC) designs. It facilitates development of multi-processor designs with large numbers of controllers and peripherals with a bus architecture. Since its inception, the scope of AMBA has, despite its name, gone far beyond microcontroller devices. Today, AMBA is widely used on a range of ASIC and SoC parts including applications processors used in modern portable mobile devices like smartphones. AMBA is a registered trademark of ARM Ltd.

The primary focus of this article is asynchronous control in digital electronic systems. In a synchronous system, operations are coordinated by one, or more, centralized clock signals. An asynchronous digital system, in contrast, has no global clock. Asynchronous systems do not depend on strict arrival times of signals or messages for reliable operation. Coordination is achieved via events such as: packet arrival, changes (transitions) of signals, handshake protocols, and other methods.

The interconnect bottleneck comprises limits on integrated circuit (IC) performance due to connections between components instead of their internal speed. In 2006 it was predicted to be a "looming crisis" by 2010.

A multiprocessor system-on-chip is a system-on-a-chip (SoC) which includes multiple microprocessors. As such, it is a multi-core System-on-Chip.

In microelectronics, a three-dimensional integrated circuit is an integrated circuit manufactured by stacking silicon wafers or dies and interconnecting them vertically using, for instance, through-silicon vias (TSVs) or Cu-Cu connections, so that they behave as a single device to achieve performance improvements at reduced power and smaller footprint than conventional two dimensional processes. 3D IC is just one of a host of 3D integration schemes that exploit the z-direction to achieve electrical performance benefits.

SGI Origin 2000 Series of server computers

The SGI Origin 2000 is a family of mid-range and high-end server computers developed and manufactured by Silicon Graphics (SGI). They were introduced in 1996 to succeed the SGI Challenge and POWER Challenge. At the time of introduction, these ran the IRIX operating system, originally version 6.4 and later, 6.5. A variant of the Origin 2000 with graphics capability is known as the Onyx2. An entry-level variant based on the same architecture but with a different hardware implementation is known as the Origin 200. The Origin 2000 was succeeded by the Origin 3000 in July 2000, and was discontinued on June 30, 2002.

The XSwitch is an interconnect used by the XCore processor. The interconnect protocol is defined by XMOS, and is based around routing messages comprising 9-bit tokens between cores on a network. The protocol is specifically designed for on-chip and board-level communication, but using LVDS drivers it can also run over longer cables.

Manycore processors are specialist multi-core processors designed for a high degree of parallel processing, containing a large number of simpler, independent processor cores. Manycore processors are used extensively in embedded computers and high-performance computing. As of November 2018, the world's third fastest supercomputer, the Chinese Sunway TaihuLight, obtains its performance from 40,960 SW26010 manycore processors, each containing 256 cores.

This is a glossary of terms relating to computer hardware – physical computer hardware, architectural issues, and peripherals.

SpiNNaker is a massively parallel, manycore supercomputer architecture designed by the Advanced Processor Technologies Research Group (APT) at the School of Computer Science, University of Manchester. It is composed of 57,600 ARM9 processors, each with 18 cores and 128 MB of mobile DDR SDRAM, totalling 1,036,800 cores and over 7 TB of RAM. The computing platform is based on spiking neural networks, useful in simulating the human brain.

Arteris, Inc. is a multinational technology firm that develops the on-chip interconnect fabric technology used in System-on-Chip (SoC) semiconductor designs for a variety of devices, particularly in mobile and consumer markets. The company specializes in the development and distribution of Network-on-Chip (NoC) interconnect Intellectual Property (IP) solutions. It is best known for its flagship product, Arteris FlexNoC, which is used in more than 60 percent of mobile and wireless SoC designs.

Heterogeneous computing refers to systems that use more than one kind of processor or cores. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorporating specialized processing capabilities to handle particular tasks.

References

  1. Kundu, Santanu; Chattopadhyay, Santanu (2014). Network-on-chip: the Next Generation of System-on-Chip Integration (1st ed.). Boca Raton, FL: CRC Press. p. 3. ISBN   9781466565272. OCLC   895661009.
  2. "Balancing On-Chip Network Latency in Multi-Application Mapping for Chip-Multiprocessors". IPDPS. May 2014.
  3. NoCS 2007 website.
  4. On-Chip Networks Bibliography
  5. Inter/Intra-Chip Optical Network Bibliography-
  6. Marzieh Lenjani, Mahmoud Reza Hashemi. "Tree-based scheme for reducing shared cache miss rate leveraging regional, statistical and temporal similarities".CS1 maint: Uses authors parameter (link)
  7. "NoC traffic". www.ece.ust.hk. Retrieved 2018-10-08.
  8. Marcello Coppola, Miltos D. Grammatikakis, Riccardo Locatelli, Giuseppe Maruccia, Lorenzo Pieralisi, "Design of Cost-Efficient Interconnect Processing Units: Spidergon STNoC", CRC Press, 2008, ISBN   978-1-4200-4471-3

Adapted from Avinoam Kolodny's's column in the ACM SIGDA e-newsletter by Igor Markov
The original text can be found at http://www.sigda.org/newsletter/2006/060415.txt

Further reading