Network on a chip

Last updated

A network on a chip or network-on-chip (NoC /ˌɛnˌˈs/ en-oh-SEE or /nɒk/ knock) [nb 1] is a network-based communications subsystem on an integrated circuit ("microchip"), most typically between modules in a system on a chip (SoC). The modules on the IC are typically semiconductor IP cores schematizing various functions of the computer system, and are designed to be modular in the sense of network science. The network on chip is a router-based packet switching network between SoC modules.

Network theory study of graphs as a representation of either symmetric relations or, more generally, of asymmetric relations between discrete objects

Network theory is the study of graphs as a representation of either symmetric relations or asymmetric relations between discrete objects. In computer science and network science, network theory is a part of graph theory: a network can be defined as a graph in which nodes and/or edges have attributes.

In telecommunication, a communications system or communication system is a collection of individual communications networks, transmission systems, relay stations, tributary stations, and data terminal equipment (DTE) usually capable of interconnection and interoperation to form an integrated whole. The components of a communications system serve a common purpose, are technically compatible, use common procedures, respond to controls, and operate in union.

Contents

NoC technology applies the theory and methods of computer networking to on-chip communication and brings notable improvements over conventional bus and crossbar communication architectures. Networks-on-chip come in many network topologies, many of which are still experimental as of 2018.

Computer network collection of autonomous computers interconnected by a single technology

A computer network is a digital telecommunications network which allows nodes to share resources. In computer networks, computing devices exchange data with each other using connections between nodes. These data links are established over cable media such as wires or optic cables, or wireless media such as Wi-Fi.

Bus (computing) communication system that transfers data between components inside a computer

In computer architecture, a bus is a communication system that transfers data between components inside a computer, or between computers. This expression covers all related hardware components and software, including communication protocols.

In electronics, a crossbar switch is a collection of switches arranged in a matrix configuration. A crossbar switch has multiple input and output lines that form a crossed pattern of interconnecting lines between which a connection may be established by closing a switch located at each intersection, the elements of the matrix. Originally, a crossbar switch consisted literally of crossing metal bars that provided the input and output paths. Later implementations achieved the same switching topology in solid state semiconductor chips. The cross-point switch is one of the principal switch architectures, together with a rotary switch, memory switch, and a crossover switch.

Networks-on-chip improve the scalability of systems-on-chip and the power efficiency of complex SoCs compared to other communication subsystem designs. A very common NoC used in contemporary personal computers is a graphics processing unit (GPU), which is commonly used in computer graphics, video gaming and accelerating artificial intelligence. They are an emerging technology, with projections for large growth in the near future as manycore computer architectures become more common.

Scalability

Scalability is the property of a system to handle a growing amount of work by adding resources to the system.

Personal computer Computer intended for use by an individual person

A personal computer (PC) is a multi-purpose computer whose size, capabilities, and price make it feasible for individual use. Personal computers are intended to be operated directly by an end user, rather than by a computer expert or technician. Unlike large costly minicomputer and mainframes, time-sharing by many people at the same time is not used with personal computers.

Graphics processing unit specialized electronic circuit; graphics accelerator

A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and game consoles. Modern GPUs are very efficient at manipulating computer graphics and image processing. Their highly parallel structure makes them more efficient than general-purpose central processing units (CPUs) for algorithms that process large blocks of data in parallel. In a personal computer, a GPU can be present on a video card or embedded on the motherboard. In certain CPUs, they are embedded on the CPU die.

Structure

Networks-on-chip can span synchronous and asynchronous clock domains, known as clock domain crossing, or use unclocked asynchronous logic. NoCs support globally asynchronous, locally synchronous electronics architectures, allowing each processor core or functional unit on the System-on-Chip to have its own clock domain. [1]

In digital electronic design a clock domain crossing (CDC), or simply clock crossing, is the traversal of a signal in a synchronous digital circuit from one clock domain into another. If a signal does not assert long enough and is not registered, it may appear asynchronous on the incoming clock boundary.

An asynchronous circuit, or self-timed circuit, is a sequential digital logic circuit which is not governed by a clock circuit or global clock signal. Instead it often uses signals that indicate completion of instructions and operations, specified by simple data transfer protocols. This type of circuit is contrasted with synchronous circuits, in which changes to the signal values in the circuit are triggered by repetitive pulses called a clock signal. Most digital devices today use synchronous circuits. However asynchronous circuits have the potential to be faster, and may also have advantages in lower power consumption, lower electromagnetic interference, and better modularity in large systems. Asynchronous circuits are an active area of research in digital logic design.

Globally asynchronous locally synchronous (GALS) is an architecture for designing electronic circuits which addresses the problem of safe and reliable data transfer between independent clock domains. GALS is a Model of Computation (MoC) that emerged in the 1980s. It allows to design computer systems consisting of several synchronous islands interacting with other islands using asynchronous communication, e.g. with FIFOs.

Architectures

NoC architectures typically model sparse small-world networks (SWNs) and scale-free networks (SFNs) to limit the number, length, area and power consumption of interconnection wires and point-to-point connections.

Small-world network

A small-world network is a type of mathematical graph in which most nodes are not neighbors of one another, but the neighbors of any given node are likely to be neighbors of each other and most nodes can be reached from every other node by a small number of hops or steps. Specifically, a small-world network is defined to be a network where the typical distance L between two randomly chosen nodes grows proportionally to the logarithm of the number of nodes N in the network, that is:

Scale-free network network whose degree distribution follows a power law

A scale-free network is a network whose degree distribution follows a power law, at least asymptotically. That is, the fraction P(k) of nodes in the network having k connections to other nodes goes for large values of k as

In telecommunications, a point-to-point connection refers to a communications connection between two communication endpoints or nodes. An example is a telephone call, in which one telephone is connected with one other, and what is said by one caller can only be heard by the other. This is contrasted with a point-to-multipoint or broadcast connection, in which many nodes can receive information transmitted by one node. Other examples of point-to-point communications links are leased lines, microwave radio relay and two-way radio.

Benefits

Traditionally, ICs have been designed with dedicated point-to-point connections, with one wire dedicated to each signal. This results in a dense network topology. For large designs, in particular, this has several limitations from a physical design viewpoint. It requires power quadratic in the number of interconnections. The wires occupy much of the area of the chip, and in nanometer CMOS technology, interconnects dominate both performance and dynamic power dissipation, as signal propagation in wires across the chip requires multiple clock cycles. This also allows more parasitic capacitance, resistance and inductance to accrue on the circuit. (See Rent's rule for a discussion of wiring requirements for point-to-point connections).

Integrated circuit design Engineering process for electronic hardware

Integrated circuit design, or IC design, is a subset of electronics engineering, encompassing the particular logic and circuit design techniques required to design integrated circuits, or ICs. ICs consist of miniaturized electronic components built into an electrical network on a monolithic semiconductor substrate by photolithography.

Quadratic function polynomial function of one variable in which the highest-degree term is of the second degree

In algebra, a quadratic function, a quadratic polynomial, a polynomial of degree 2, or simply a quadratic, is a polynomial function with one or more variables in which the highest-degree term is of the second degree. For example, a quadratic function in three variables x, y, and z contains exclusively terms x2, y2, z2, xy, xz, yz, x, y, z, and a constant:

Die (integrated circuit) an unpackaged integrated circuit

A die, in the context of integrated circuits, is a small block of semiconducting material on which a given functional circuit is fabricated. Typically, integrated circuits are produced in large batches on a single wafer of electronic-grade silicon (EGS) or other semiconductor through processes such as photolithography. The wafer is cut (diced) into many pieces, each containing one copy of the circuit. Each of these pieces is called a die.

Sparsity and locality of interconnections in the communications subsystem yield several improvements over traditional bus-based and crossbar-based systems.

Parallelism and scalability

The wires in the links of the network-on-chip are shared by many signals. A high level of parallelism is achieved, because all data links in the NoC can operate simultaneously on different data packets.[ why? ] Therefore, as the complexity of integrated systems keeps growing, a NoC provides enhanced performance (such as throughput) and scalability in comparison with previous communication architectures (e.g., dedicated point-to-point signal wires, shared buses, or segmented buses with bridges). Of course, the algorithms [ which? ] must be designed in such a way that they offer large parallelism and can hence utilize the potential of NoC.

Current research

Some researchers[ who? ] think that NoCs need to support quality of service (QoS), namely achieve the various requirements in terms of throughput, end-to-end delays, fairness, [2] and deadlines.[ citation needed ] Real-time computation, including audio and video playback, is one reason for providing QoS support. However, current system implementations like VxWorks, RTLinux or QNX are able to achieve sub-millisecond real-time computing without special hardware.[ citation needed ]

This may indicate that for many real-time applications the service quality of existing on-chip interconnect infrastructure is sufficient, and dedicated hardware logic would be necessary to achieve microsecond precision, a degree that is rarely needed in practice for end users (sound or video jitter need only tenth of milliseconds latency guarantee). Another motivation for NoC-level quality of service (QoS) is to support multiple concurrent users sharing resources of a single chip multiprocessor in a public cloud computing infrastructure. In such instances, hardware QoS logic enables the service provider to make contractual guarantees on the level of service that a user receives, a feature that may be deemed desirable by some corporate or government clients.[ citation needed ]

Many challenging research problems remain to be solved at all levels, from the physical link level through the network level, and all the way up to the system architecture and application software. The first dedicated research symposium on networks on chip was held at Princeton University, in May 2007. [3] The second IEEE International Symposium on Networks-on-Chip was held in April 2008 at Newcastle University.

Research has been conducted on integrated optical waveguides and devices comprising an optical network on a chip (ONoC). [4] [5]

Side benefits of NoC: Cache miss pattern prediction and data forwarding leveraging augmented switches

In a multi-core system, connected by NoC, coherency messages and cache miss requests have to pass switches. Accordingly, switches can be augmented with simple tracking and forwarding elements to detect which cache blocks will be requested in the future by which cores. Then, the forwarding elements multicast any requested block to all the cores that may request the block in the future. This mechanism reduces cache miss rate . [6]

Benchmarks

NoC development and studies require comparing different proposals and options. NoC traffic patterns are under development to help such evaluations. Existing NoC benchmarks include NoCBench and MCSL NoC Traffic Patterns. [7]

Interconnect processing unit

An interconnect processing unit (IPU) [8] is a on-chip communication network with hardware and software components which jointly implement key functions of different system-on-chip programming models through a set of communication and synchronization primitives and provide low-level platform services to enable advanced features[ which? ] in modern heterogeneous applications[ definition needed ] on a single die.

Commercial providers

See also

Notes

  1. This article uses the convention that "NoC" is pronounced /nɒk/ nock. Therefore, it uses the convention "a" for the indefinite article corresponding to NoC ("a NoC"). Other sources may pronounce it as /ˌɛnˌˈs/ en-oh-SEE and therefore use "an NoC".

Related Research Articles

Central processing unit Electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logical, control and input/output (I/O) operations specified by the instructions

A central processing unit (CPU), also called a central processor or main processor, is the electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions. The first CPU was created in 1971 by Intel(4004) datawidth 4-bit wide.The computer industry has used the term "central processing unit" at least since the early 1960s. Traditionally, the term "CPU" refers to a processor, more specifically to its processing unit and control unit (CU), distinguishing these core elements of a computer from external components such as main memory and I/O circuitry.

Transputer

The transputer is a series of pioneering microprocessors from the 1980s, featuring integrated memory and serial communication links, intended for parallel computing. They were designed and produced by Inmos, a semiconductor company based in Bristol, United Kingdom.

System on a chip type of integrated circuit

A system on a chip or system on chip is an integrated circuit that integrates all components of a computer or other electronic system. These components typically include a central processing unit (CPU), memory, input/output ports and secondary storage – all on a single substrate or microchip, the size of a coin. It may contain digital, analog, mixed-signal, and often radio frequency signal processing functions, depending on the application. As they are integrated on a single substrate, SoCs consume much less power and take up much less area than multi-chip designs with equivalent functionality. Because of this, SoCs are very common in the mobile computing and edge computing markets. Systems on chip are commonly used in embedded systems and the Internet of Things.

MIMD class of parallel computer architecture in Flynns taxonomy, in which multiple operations are performed on multiple data points simultaneously

In computing, MIMD is a technique employed to achieve parallelism. Machines using MIMD have a number of processors that function asynchronously and independently. At any time, different processors may be executing different instructions on different pieces of data. MIMD architectures may be used in a number of application areas such as computer-aided design/computer-aided manufacturing, simulation, modeling, and as communication switches. MIMD machines can be of either shared memory or distributed memory categories. These classifications are based on how MIMD processors access memory. Shared memory machines may be of the bus-based, extended, or hierarchical type. Distributed memory machines may have hypercube or mesh interconnection schemes.

In electronics and especially synchronous digital circuits, a clock signal is a particular type of signal that oscillates between a high and a low state and is used like a metronome to coordinate actions of digital circuits.

Front-side bus computer communication interface (bus) often used in Intel-chip-based computers during the 1990s and 2000s; replaced by replaced by HyperTransport, Intel QuickPath Interconnect or Direct Media Interface in modern CPUs

A front-side bus (FSB) is a computer communication interface (bus) that was often used in Intel-chip-based computers during the 1990s and 2000s. The competing EV6 bus served the same function for AMD CPUs. Both typically carry data between the central processing unit (CPU) and a memory controller hub, known as the northbridge.

Robert Drost American computer scientist

Robert Drost is an American computer scientist. He was born in 1970 in New York City.

Rent's rule pertains to the organization of computing logic, specifically the relationship between the number of external signal connections to a logic block with the number of logic gates in the logic block, and has been applied to circuits ranging from small digital circuits to mainframe computers.

Multi-core processor microprocessor with more than one core

A multi-core processor is a computer processor integrated circuit with two or more separate processing units, called cores, which each read and execute program instructions, as if the computer had several processors. The instructions are ordinary CPU instructions but the single processor can run instructions on separate cores at the same time, increasing overall speed for programs that support multithreading or other parallel computing techniques. Manufacturers typically integrate the cores onto a single integrated circuit die or onto multiple dies in a single chip package. The microprocessors currently used in almost all personal computers are multi-core. A multi-core processor implements multiprocessing in a single physical package. Designers may couple cores in a multi-core device tightly or loosely. For example, cores may or may not share caches, and they may implement message passing or shared-memory inter-core communication methods. Common network topologies to interconnect cores include bus, ring, two-dimensional mesh, and crossbar. Homogeneous multi-core systems include only identical cores; heterogeneous multi-core systems have cores that are not identical. Just as with single-processor systems, cores in multi-core systems may implement architectures such as VLIW, superscalar, vector, or multithreading.

The primary focus of this article is asynchronous control in digital electronic systems. In a synchronous system, operations are coordinated by one, or more, centralized clock signals. An asynchronous digital system, in contrast, has no global clock. Asynchronous systems do not depend on strict arrival times of signals or messages for reliable operation. Coordination is achieved via events such as: packet arrival, changes (transitions) of signals, handshake protocols, and other methods.

The interconnect bottleneck comprises limits on integrated circuit (IC) performance due to connections between components instead of their internal speed. In 2006 it was predicted to be a "looming crisis" by 2010.

A multiprocessor system-on-chip is a system-on-a-chip (SoC) which includes multiple microprocessors. As such, it is a multi-core System-on-Chip.

In microelectronics, a three-dimensional integrated circuit is an integrated circuit manufactured by stacking silicon wafers or dies and interconnecting them vertically using, for instance, through-silicon vias (TSVs) or Cu-Cu connections, so that they behave as a single device to achieve performance improvements at reduced power and smaller footprint than conventional two dimensional processes. 3D IC is just one of a host of 3D integration schemes that exploit the z-direction to achieve electrical performance benefits.

A massively parallel processor array, also known as a multi purpose processor array (MPPA) is a type of integrated circuit which has a massively parallel array of hundreds or thousands of CPUs and RAM memories. These processors pass work to one another through a reconfigurable interconnect of channels. By harnessing a large number of processors working in parallel, an MPPA chip can accomplish more demanding tasks than conventional chips. MPPAs are based on a software parallel programming model for developing high-performance embedded system applications.

In communication networks, cognitive network (CN) is a new type of data network that makes use of cutting edge technology from several research areas to solve some problems current networks are faced with. Cognitive network is different from cognitive radio (CR) as it covers all the layers of the OSI model.

This is a glossary of terms relating to computer hardware – physical computer hardware, architectural issues, and peripherals.

Torus interconnect

A torus interconnect is a switch-less network topology for connecting processing nodes in a parallel computer system.

Heterogeneous computing refers to systems that use more than one kind of processor or cores. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorporating specialized processing capabilities to handle particular tasks.

References

  1. Kundu, Santanu; Chattopadhyay, Santanu (2014). Network-on-chip: the Next Generation of System-on-Chip Integration (1st ed.). Boca Raton, FL: CRC Press. p. 3. ISBN   9781466565272. OCLC   895661009.
  2. "Balancing On-Chip Network Latency in Multi-Application Mapping for Chip-Multiprocessors". IPDPS. May 2014.
  3. NoCS 2007 website.
  4. On-Chip Networks Bibliography
  5. Inter/Intra-Chip Optical Network Bibliography-
  6. Marzieh Lenjani, Mahmoud Reza Hashemi (2014). "Tree-based scheme for reducing shared cache miss rate leveraging regional, statistical and temporal similarities". Iet Computers & Digital Techniques. 8: 30–48. doi:10.1049/iet-cdt.2011.0066.CS1 maint: Uses authors parameter (link)
  7. "NoC traffic". www.ece.ust.hk. Retrieved 2018-10-08.
  8. Marcello Coppola, Miltos D. Grammatikakis, Riccardo Locatelli, Giuseppe Maruccia, Lorenzo Pieralisi, "Design of Cost-Efficient Interconnect Processing Units: Spidergon STNoC", CRC Press, 2008, ISBN   978-1-4200-4471-3

Adapted from Avinoam Kolodny's's column in the ACM SIGDA e-newsletter by Igor Markov
The original text can be found at http://www.sigda.org/newsletter/2006/060415.txt

Further reading