Wavefront arbiter

Last updated January 31, 2023

A Wavefront arbiter is a circuit used to make decisions which control the crossbar of a high capacity switch fabric in parallel.^[1] It was commercialized in the TT1 and TTx chip sets designed by Abrizio and sold by PMC-Sierra.

Context

A crossbar is the central portion of a crossbar switch fabric which connects the inputs to the outputs. A set of decisions of which inputs are connected to which outputs must be made each arbitration period. In high speed cell switching or packet switching applications, the arbitration period is very short. There are often millions or billions of arbitration periods per second.

An arbiter is the circuit that makes the decision as to which of the crossbar's many switches should be closed. Speed is a key design criterion of an arbiter in some applications.

Algorithm description

A wavefront arbiter is a particular type of arbiter that is optimized for high-speed operation. For a unicast switch, the algorithm is as follows:

The decision starts at a single point in the x-y matrix which represents the physical switches, for example the upper left hand corner.
Based on the requests, a decision is made whether to close that switch, connecting the corresponding input and output.
The result of this decision is then fed to the right along the matrix axis representing the input, and down along the matrix axis representing the output.
The results of the first computation then enable the next computation at the point to the right and at the point below and switch closing decision is made at each of those two points.
The results of these subsequent two calculations then are then fed to the points below and to the right of them. These results then enable the decisions at the next three points which are to the right and below.
These results are again fed to the right and below.
In the case where the calculation did not start in the upper left hand corner, the results wrap around the right back to the first left column and around the bottom to the top row.
The calculation continues until all of the decisions have been made.

Benefit of use

The benefits of this type of calculation include:

Speed – the algorithm can be implemented in a combinatorial manner (without hardware registers), allowing the wavefront to propagate across much or all of the matrix in one or a few clock periods.
Regularity – the nodes of the physical structure used to compute this are all identical. This is often called a systolic computation. Regular structures can sometimes lead to compact semiconductor implementations.

Variants

There are numerous variants of this method including:

Randomizing or shuffling the order in which the rows and columns are considered. Some sort of shuffling is generally necessary to achieve fairness.
Multicast variants of this method where one input can be connected to multiple outputs in either one or multiple passes.

Related Research Articles

Control theory is a field of mathematics that deals with the control of dynamical systems in engineered processes and machines. The objective is to develop a model or algorithm governing the application of system inputs to drive the system to a desired state, while minimizing any delay, overshoot, or steady-state error and ensuring a level of control stability; often with the aim to achieve a degree of optimality.

In electronics and telecommunications, a crossbar switch is a collection of switches arranged in a matrix configuration. A crossbar switch has multiple input and output lines that form a crossed pattern of interconnecting lines between which a connection may be established by closing a switch located at each intersection, the elements of the matrix. Originally, a crossbar switch consisted literally of crossing metal bars that provided the input and output paths. Later implementations achieved the same switching topology in solid-state electronics. The crossbar switch is one of the principal telephone exchange architectures, together with a rotary switch, memory switch, and a crossover switch.

A nonblocking minimal spanning switch is a device that can connect N inputs to N outputs in any combination. The most familiar use of switches of this type is in a telephone exchange. The term "non-blocking" means that if it is not defective, it can always make the connection. The term "minimal" means that it has the fewest possible components, and therefore the minimal expense.

In parallel computer architectures, a systolic array is a homogeneous network of tightly coupled data processing units (DPUs) called cells or nodes. Each node or DPU independently computes a partial result as a function of the data received from its upstream neighbours, stores the result within itself and passes it downstream. Systolic arrays were first used in Colossus, which was an early computer used to break German Lorenz ciphers during World War II. Due to the classified nature of Colossus, they were independently invented or rediscovered by H. T. Kung and Charles Leiserson who described arrays for many dense linear algebra computations for banded matrices. Early applications include computing greatest common divisors of integers and polynomials. They are sometimes classified as multiple-instruction single-data (MISD) architectures under Flynn's taxonomy, but this classification is questionable because a strong argument can be made to distinguish systolic arrays from any of Flynn's four categories: SISD, SIMD, MISD, MIMD, as discussed later in this article.

The Cooley–Tukey algorithm, named after J. W. Cooley and John Tukey, is the most common fast Fourier transform (FFT) algorithm. It re-expresses the discrete Fourier transform (DFT) of an arbitrary composite size $in terms of N 1 smaller DFTs of sizes N 2, recursively, to reduce the computation time to O(N log N) for highly composite N (smooth numbers). Because of the algorithm's importance, specific variants and implementation styles have become known by their own names, as described below.$

In bioinformatics, BLAST is an algorithm and program for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences. A BLAST search enables a researcher to compare a subject protein or nucleotide sequence with a library or database of sequences, and identify database sequences that resemble alphabet above a certain threshold. For example, following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genome to see if humans carry a similar gene; BLAST will identify sequences in the pig genome that resemble the mouse gene based on similarity of sequence.

In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to be related by a constant factor.

In numerical linear algebra, the QR algorithm or QR iteration is an eigenvalue algorithm: that is, a procedure to calculate the eigenvalues and eigenvectors of a matrix. The QR algorithm was developed in the late 1950s by John G. F. Francis and by Vera N. Kublanovskaya, working independently. The basic idea is to perform a QR decomposition, writing the matrix as a product of an orthogonal matrix and an upper triangular matrix, multiply the factors in the reverse order, and iterate.

In machine learning, backpropagation is a widely used algorithm for training feedforward artificial neural networks. Generalizations of backpropagation exist for other artificial neural networks (ANNs), and for functions generally. These classes of algorithms are all referred to generically as "backpropagation". In fitting a neural network, backpropagation computes the gradient of the loss function with respect to the weights of the network for a single input–output example, and does so efficiently, unlike a naive direct computation of the gradient with respect to each weight individually. This efficiency makes it feasible to use gradient methods for training multilayer networks, updating weights to minimize loss; gradient descent, or variants such as stochastic gradient descent, are commonly used. The backpropagation algorithm works by computing the gradient of the loss function with respect to each weight by the chain rule, computing the gradient one layer at a time, iterating backward from the last layer to avoid redundant calculations of intermediate terms in the chain rule; this is an example of dynamic programming.

A load-balanced switch is a switch architecture which guarantees 100% throughput with no central arbitration at all, at the cost of sending each packet across the crossbar twice. Load-balanced switches are a subject of research for large routers scaled past the point of practical central arbitration.

Abrizio was a fabless semiconductor company which made switching fabric chip sets. Their chip set, the TT1, was used by several large system development companies as the core switch fabric in their high value communication systems.

An Omega network is a network configuration often used in parallel computing architectures. It is an indirect topology that relies on the perfect shuffle interconnection algorithm.

In the field of telecommunications, a Clos network is a kind of multistage circuit-switching network which represents a theoretical idealization of practical, multistage switching systems. It was invented by Edson Erwin in 1938 and first formalized by Charles Clos in 1952.

Ship stability is an area of naval architecture and ship design that deals with how a ship behaves at sea, both in still water and in waves, whether intact or damaged. Stability calculations focus on centers of gravity, centers of buoyancy, the metacenters of vessels, and on how these interact.

Virtual output queueing (VOQ) is a technique used in certain network switch architectures where, rather than keeping all traffic in a single queue, separate queues are maintained for each possible output location. It addresses a common problem known as head-of-line blocking.

Multistage interconnection networks (MINs) are a class of high-speed computer networks usually composed of processing elements (PEs) on one end of the network and memory elements (MEs) on the other end, connected by switching elements (SEs). The switching elements themselves are usually connected to each other in stages, hence the name.

In physics, a wavefront is the locus of points having the same phase.

An integrating ADC is a type of analog-to-digital converter that converts an unknown input voltage into a digital representation through the use of an integrator. In its basic implementation, the dual-slope converter, the unknown input voltage is applied to the input of the integrator and allowed to ramp for a fixed time period. Then a known reference voltage of opposite polarity is applied to the integrator and is allowed to ramp until the integrator output returns to zero. The input voltage is computed as a function of the reference voltage, the constant run-up time period, and the measured run-down time period. The run-down time measurement is usually made in units of the converter's clock, so longer integration times allow for higher resolutions. Likewise, the speed of the converter can be improved by sacrificing resolution.

In computing, a logic block or configurable logic block (CLB) is a fundamental building block of field-programmable gate array (FPGA) technology. Logic blocks can be configured by the engineer to provide reconfigurable logic gates.

The Mivar-based approach is a mathematical tool for designing artificial intelligence (AI) systems. Mivar was developed by combining production and Petri nets. The Mivar-based approach was developed for semantic analysis and adequate representation of humanitarian epistemological and axiological principles in the process of developing artificial intelligence. The Mivar-based approach incorporates computer science, informatics and discrete mathematics, databases, expert systems, graph theory, matrices and inference systems. The Mivar-based approach involves two technologies:

References

↑ Gelenbe, E.; Bagchi, K.; Zobrist, G. (1999). Network Systems Design. Taylor & Francis. p. 6. ISBN 978-90-5699-635-2 . Retrieved 13 September 2018.

Stanford class notes description of the algorithm

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Gelenbe_Bagchi_Zobrist_1999_p._6-1] Gelenbe, E.; Bagchi, K.; Zobrist, G. (1999). Network Systems Design. Taylor & Francis. p. 6. ISBN 978-90-5699-635-2 . Retrieved 13 September 2018.

[1]