Retiming

Last updated

Retiming is the technique of moving the structural location of latches or registers in a digital circuit to improve its performance, area, and/or power characteristics in such a way that preserves its functional behavior at its outputs. Retiming was first described by Charles E. Leiserson and James B. Saxe in 1983. [1]

Contents

The technique uses a directed graph where the vertices represent asynchronous combinational blocks and the directed edges represent a series of registers or latches (the number of registers or latches can be zero). Each vertex has a value corresponding to the delay through the combinational circuit it represents. After doing this, one can attempt to optimize the circuit by pushing registers from output to input and vice versa - much like bubble pushing. Two operations can be used - deleting a register from each input of a vertex while adding a register to all outputs, and conversely adding a register to each input of vertex and deleting a register from all outputs. In all cases, if the rules are followed, the circuit will have the same functional behavior as it did before retiming.

Formal description

The initial formulation of the retiming problem as described by Leiserson and Saxe is as follows. Given a directed graph whose vertices represent logic gates or combinational delay elements in a circuit, assume there is a directed edge between two elements that are connected directly or through one or more registers. Let the weight of each edge be the number of registers present along edge in the initial circuit. Let be the propagation delay through vertex . The goal in retiming is to compute an integer lag value for each vertex such that the retimed weight of every edge is non-negative. There is a proof that this preserves the output functionality. [2]

Minimizing the clock period with network flow

The most common use of retiming is to minimize the clock period. A simple technique to optimize the clock period is to search for the minimum feasible period (e.g. using binary search).

The feasibility of a clock period can be checked in one of several ways. The linear program below is feasible if and only if is a feasible clock period. Let be the minimum number of registers along any path from to (if such a path exists), and is the maximum delay along any path from to with W(u,v) registers. The dual of this program is a minimum cost circulation problem, which can be solved efficiently as a network problem. The limitations of this approach arise from the enumeration and size of the and matrices.

Given and a target clock period
Find
Such that
if

Minimizing the clock period with MILP

Alternatively, feasibility of a clock period can be expressed as a mixed-integer linear program (MILP). A solution will exist and a valid lag function will be returned if and only if the period is feasible.

Given and a target clock period
Find and
Such that

Other formulations and extensions

Alternate formulations allow the minimization of the register count and the minimization of the register count under a delay constraint. The initial paper includes extensions that allow the consideration of fan-out sharing and a more general delay model. Subsequent work has addressed the inclusion of register delays, [3] load-dependent delay models, [3] and hold constraints. [4]

Problems

Retiming has found industrial use, albeit sporadic. Its primary drawback is that the state encoding of the circuit is destroyed, making debugging, testing, and verification substantially more difficult. Some retimings may also require complicated initialization logic to have the circuit start in an identical initial state. Finally, the changes in the circuit's topology have consequences in other logical and physical synthesis steps that make design closure difficult.

Alternatives

Clock skew scheduling is a related technique for optimizing sequential circuits. Whereas retiming relocates the structural position of the registers, clock skew scheduling moves their temporal position by scheduling the arrival time of the clock signals. The lower bound of the achievable minimum clock period of both techniques is the maximum mean cycle time (i.e. the total combinational delay along any path divided by the number of registers along it).

See also

Notes

  1. Charles E. Leiserson, Flavio M. Rose, JamesB. Saxe (1983). "Optimizing Synchronous Circuitry by Retiming". Third Caltech Conference on Very Large Scale Integration. Springer: 87–116. doi:10.1007/978-3-642-95432-0_7.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  2. Charles E. LeisersonJames B. Saxe (June 1991). "Retiming synchronous circuitry". Algorithmica. Springer. 6 (1): 5–35. doi:10.1007/BF01759032. S2CID   18674287.
  3. 1 2 K. N. Lalgudi, M. C. Papaefthymiou, Retiming edge-triggered circuits under general delay models, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol.16, no.12, pp.1393-1408, Dec. 1997.
  4. M. C. Papaefthymiou, Asymptotically efficient retiming under setup and hold constraints, IEEE/ACM International Conference on Computer-Aided Design, 1998.

Related Research Articles

Shortest path problem Computational problem of graph theory

In graph theory, the shortest path problem is the problem of finding a path between two vertices in a graph such that the sum of the weights of its constituent edges is minimized.

Dijkstras algorithm Graph search algorithm

Dijkstra's algorithm is an algorithm for finding the shortest paths between nodes in a graph, which may represent, for example, road networks. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.

In automata theory, sequential logic is a type of logic circuit whose output depends on the present value of its input signals and on the sequence of past inputs, the input history. This is in contrast to combinational logic, whose output is a function of only the present input. That is, sequential logic has state (memory) while combinational logic does not.

Depth-first search Search algorithm

Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node and explores as far as possible along each branch before backtracking.

In electronics and especially synchronous digital circuits, a clock signal oscillates between a high and a low state and is used like a metronome to coordinate actions of digital circuits.

Bellman–Ford algorithm Algorithm for finding the shortest paths in graphs

The Bellman–Ford algorithm is an algorithm that computes shortest paths from a single source vertex to all of the other vertices in a weighted digraph. It is slower than Dijkstra's algorithm for the same problem, but more versatile, as it is capable of handling graphs in which some of the edge weights are negative numbers. The algorithm was first proposed by Alfonso Shimbel (1955), but is instead named after Richard Bellman and Lester Ford Jr., who published it in 1958 and 1956, respectively. Edward F. Moore also published a variation of the algorithm in 1959, and for this reason it is also sometimes called the Bellman–Ford–Moore algorithm.

In computer science, the Floyd–Warshall algorithm is an algorithm for finding shortest paths in a directed weighted graph with positive or negative edge weights. A single execution of the algorithm will find the lengths of shortest paths between all pairs of vertices. Although it does not return details of the paths themselves, it is possible to reconstruct the paths with simple modifications to the algorithm. Versions of the algorithm can also be used for finding the transitive closure of a relation , or widest paths between all pairs of vertices in a weighted graph.

In computer science, the Edmonds–Karp algorithm is an implementation of the Ford–Fulkerson method for computing the maximum flow in a flow network in time. The algorithm was first published by Yefim Dinitz in 1970 and independently published by Jack Edmonds and Richard Karp in 1972. Dinic's algorithm includes additional techniques that reduce the running time to .

Vertex cover Subset of a graphs vertices, including at least one endpoint of every edge

In graph theory, a vertex cover of a graph is a set of vertices that includes at least one endpoint of every edge of the graph.

In electronic instrumentation and signal processing, a time-to-digital converter (TDC) is a device for recognizing events and providing a digital representation of the time they occurred. For example, a TDC might output the time of arrival for each incoming pulse. Some applications wish to measure the time interval between two events rather than some notion of an absolute time.

The set cover problem is a classical question in combinatorics, computer science, operations research, and complexity theory. It is one of Karp's 21 NP-complete problems shown to be NP-complete in 1972.

In digital circuit design, register-transfer level (RTL) is a design abstraction which models a synchronous digital circuit in terms of the flow of digital signals (data) between hardware registers, and the logical operations performed on those signals.

Formal equivalence checking process is a part of electronic design automation (EDA), commonly used during the development of digital integrated circuits, to formally prove that two representations of a circuit design exhibit exactly the same behavior.

In digital electronics, a synchronous circuit is a digital circuit in which the changes in the state of memory elements are synchronized by a clock signal. In a sequential digital logic circuit, data are stored in memory devices called flip-flops or latches. The output of a flip-flop is constant until a pulse is applied to its "clock" input, upon which the input of the flip-flop is latched into its output. In a synchronous logic circuit, an electronic oscillator called the clock generates a string (sequence) of pulses, the "clock signal". This clock signal is applied to every storage element, so in an ideal synchronous circuit, every change in the logical levels of its storage components is simultaneous. Ideally, the input to each storage element has reached its final value before the next clock occurs, so the behaviour of the whole circuit can be predicted exactly. Practically, some delay is required for each logical operation, resulting in a maximum speed limitations at which each synchronous system can run.

Asynchronous circuit is a sequential digital logic circuit that doesn't use a global clock circuit or signal generator to synchronize its components. instead, the components are driven by a handshaking circuit which indicates a completion of a set of instructions. Handshaking works by simple data transfer protocols. Many synchronous circuits were developed in early 1950s as part of bigger asynchronous systems. Asynchronous circuits and theory surrounding is a part of several steps in integrated circuit design, a field of digital electronics engineering.

Clock skew is a phenomenon in synchronous digital circuit systems in which the same sourced clock signal arrives at different components at different times due to gate or, in more advanced semiconductor technology, wire signal propagation delay. The instantaneous difference between the readings of any two clocks is called their skew.

Static timing analysis (STA) is a simulation method of computing the expected timing of a synchronous digital circuit without requiring a simulation of the full circuit.

In integrated circuit design, dynamic logic is a design methodology in combinatory logic circuits, particularly those implemented in MOS technology. It is distinguished from the so-called static logic by exploiting temporary storage of information in stray and gate capacitances. It was popular in the 1970s and has seen a recent resurgence in the design of high speed digital electronics, particularly computer CPUs. Dynamic logic circuits are usually faster than static counterparts, and require less surface area, but are more difficult to design. Dynamic logic has a higher toggle rate than static logic but the capacitive loads being toggled are smaller so the overall power consumption of dynamic logic may be higher or lower depending on various tradeoffs. When referring to a particular logic family, the dynamic adjective usually suffices to distinguish the design methodology, e.g. dynamic CMOS or dynamic SOI design.

Flip-flop (electronics) Electronic circuit with two stable states

In electronics, a flip-flop or latch is a circuit that has two stable states and can be used to store state information – a bistable multivibrator. The circuit can be made to change state by signals applied to one or more control inputs and will have one or two outputs. It is the basic storage element in sequential logic. Flip-flops and latches are fundamental building blocks of digital electronics systems used in computers, communications, and many other types of systems.

Unfolding is a transformation technique of duplicating the functional blocks to increase the throughput of the DSP program in such a way that preserves its functional behavior at its outputs. Unfolding was first proposed by Keshab K. Parhi and David G. Messerschmitt in 1989. Unfolding in general program is as known as Loop unrolling

References