Register-transfer level

Last updated

In digital circuit design, register-transfer level (RTL) is a design abstraction which models a synchronous digital circuit in terms of the flow of digital signals (data) between hardware registers, and the logical operations performed on those signals.

A synchronous circuit is a digital circuit in which the changes in the state of memory elements are synchronized by a clock signal. In a sequential digital logic circuit, data is stored in memory devices called flip-flops or latches. The output of a flip-flop is constant until a pulse is applied to its "clock" input, upon which the input of the flip-flop is latched into its output. In a synchronous logic circuit, an electronic oscillator called the clock generates a string of pulses, the "clock signal". This clock signal is applied to every storage element, so in an ideal synchronous circuit, every change in the logical levels of its storage components is simultaneous. Ideally, the input to each storage element has reached its final value before the next clock occurs, so the behaviour of the whole circuit can be predicted exactly. Practically, some delay is required for each logical operation, resulting in a maximum speed at which each synchronous system can run.

Data is a set of values of subjects with respect to qualitative or quantitative variables.

In digital electronics, especially computing, hardware registers are circuits typically composed of flip flops, often with many characteristics similar to memory, such as:

Contents

Register-transfer-level abstraction is used in hardware description languages (HDLs) like Verilog and VHDL to create high-level representations of a circuit, from which lower-level representations and ultimately actual wiring can be derived. Design at the RTL level is typical practice in modern digital design. [1]

In computer engineering, a hardware description language (HDL) is a specialized computer language used to describe the structure and behavior of electronic circuits, and most commonly, digital logic circuits.

Verilog, standardized as IEEE 1364, is a hardware description language (HDL) used to model electronic systems. It is most commonly used in the design and verification of digital circuits at the register-transfer level of abstraction. It is also used in the verification of analog circuits and mixed-signal circuits, as well as in the design of genetic circuits. In 2009, the Verilog standard was merged into the SystemVerilog standard, creating IEEE Standard 1800-2009. Since then, Verilog is officially part of the SystemVerilog language. The current version is IEEE standard 1800-2017.

VHDL is a hardware description language used in electronic design automation to describe digital and mixed-signal systems such as field-programmable gate arrays and integrated circuits. VHDL can also be used as a general purpose parallel programming language.

RTL description

A synchronous circuit consists of two kinds of elements: registers (Sequential logic) and combinational logic. Registers (usually implemented as D flip-flops) synchronize the circuit's operation to the edges of the clock signal, and are the only elements in the circuit that have memory properties. Combinational logic performs all the logical functions in the circuit and it typically consists of logic gates.

In digital circuit theory, combinational logic is a type of digital logic which is implemented by Boolean circuits, where the output is a pure function of the present input only. This is in contrast to sequential logic, in which the output depends not only on the present input but also on the history of the input. In other words, sequential logic has memory while combinational logic does not.

In electronics, a logic gate is an idealized or physical device implementing a Boolean function; that is, it performs a logical operation on one or more binary inputs and produces a single binary output. Depending on the context, the term may refer to an ideal logic gate, one that has for instance zero rise time and unlimited fan-out, or it may refer to a non-ideal physical device.

For example, a very simple synchronous circuit is shown in the figure. The inverter is connected from the output, Q, of a register to the register's input, D, to create a circuit that changes its state on each rising edge of the clock, clk. In this circuit, the combinational logic consists of the inverter.

In digital logic, an inverter or NOT gate is a logic gate which implements logical negation. The truth table is shown on the right.

When designing digital integrated circuits with a hardware description language, the designs are usually engineered at a higher level of abstraction than transistor level (logic families) or logic gate level. In HDLs the designer declares the registers (which roughly correspond to variables in computer programming languages), and describes the combinational logic by using constructs that are familiar from programming languages such as if-then-else and arithmetic operations. This level is called register-transfer level. The term refers to the fact that RTL focuses on describing the flow of signals between registers.

In computer engineering, a logic family may refer to one of two related concepts. A logic family of monolithic digital integrated circuit devices is a group of electronic logic gates constructed using one of several different designs, usually with compatible logic levels and power supply characteristics within a family. Many logic families were produced as individual components, each containing one or a few related basic logical functions, which could be used as "building-blocks" to create systems or as so-called "glue" to interconnect more complex integrated circuits. A "logic family" may also refer to a set of techniques used to implement logic within VLSI integrated circuits such as central processors, memories, or other complex functions. Some such logic families use static techniques to minimize design complexity. Other such logic families, such as domino logic, use clocked dynamic techniques to minimize size, power consumption and delay.

As an example, the circuit mentioned above can be described in VHDL as follows:

D<=notQ;process(clk)beginifrising_edge(clk)thenQ<=D;endif;endprocess;

Using an EDA tool for synthesis, this description can usually be directly translated to an equivalent hardware implementation file for an ASIC or an FPGA. The synthesis tool also performs logic optimization.

Electronic design automation (EDA), also referred to as electronic computer-aided design (ECAD), is a category of software tools for designing electronic systems such as integrated circuits and printed circuit boards. The tools work together in a design flow that chip designers use to design and analyze entire semiconductor chips. Since a modern semiconductor chip can have billions of components, EDA tools are essential for their design.

An application-specific integrated circuit is an integrated circuit (IC) customized for a particular use, rather than intended for general-purpose use. For example, a chip designed to run in a digital voice recorder or a high-efficiency bitcoin miner is an ASIC. Application-specific standard products (ASSPs) are intermediate between ASICs and industry standard integrated circuits like the 7400 series or the 4000 series.

In electronics, logic synthesis is a process by which an abstract specification of desired circuit behavior, typically at register transfer level (RTL), is turned into a design implementation in terms of logic gates, typically by a computer program called a synthesis tool. Common examples of this process include synthesis of designs specified in hardware description languages, including VHDL and Verilog. Some synthesis tools generate bitstreams for programmable logic devices such as PALs or FPGAs, while others target the creation of ASICs. Logic synthesis is one aspect of electronic design automation.

At the register-transfer level, some types of circuits can be recognized. If there is a cyclic path of logic from a register's output to its input (or from a set of registers outputs to its inputs), the circuit is called a state machine or can be said to be sequential logic. If there are logic paths from a register to another without a cycle, it is called a pipeline.

RTL in the circuit design cycle

RTL is used in the logic design phase of the integrated circuit design cycle.

An RTL description is usually converted to a gate-level description of the circuit by a logic synthesis tool. The synthesis results are then used by placement and routing tools to create a physical layout.

Logic simulation tools may use a design's RTL description to verify its correctness.

Power estimation techniques for RTL

The most accurate power analysis tools are available for the circuit level but unfortunately, even with switch- rather than device-level modelling, tools at the circuit level have disadvantages like they are either too slow or require too much memory thus inhibiting large chip handling. The majority of these are simulators like SPICE and have been used by the designers for many years as performance analysis tools. Due to these disadvantages, gate-level power estimation tools have begun to gain some acceptance where faster, probabilistic techniques have begun to gain a foothold. But it also has its trade off as speedup is achieved on the cost of accuracy, especially in the presence of correlated signals. Over the years it has been realized that biggest wins in low power design cannot come from circuit- and gate-level optimizations whereas architecture, system, and algorithm optimizations tend to have the largest impact on power consumption. Therefore, there has been a shift in the incline of the tool developers towards high-level analysis and optimization tools for power.

Motivation

It is well known that more significant power reductions are possible if optimizations are made on levels of abstraction, like the architectural and algorithmic level, which are higher than the circuit or gate level [2] This provides the required motivation for the developers to focus on the development of new architectural level power analysis tools. This in no way implies that lower level tools are unimportant. Instead, each layer of tools provides a foundation upon which the next level can be built. The abstractions of the estimation techniques at a lower level can be used on a higher level with slight modifications.

Advantages of doing power estimation at RTL or architectural level

• Designers use a Register-Transfer Level(RTL) description of the design to make optimizations and trade-offs very early in the design flow.
• The presence of functional blocks in an RTL description makes the complexity of architectural design much more manageable even for large chips because RTL has granularity sufficiently larger than gate- or circuit-level descriptions.

Gate Equivalents [3]

It is a technique based on the concept of gate equivalents. The complexity of a chip architecture can be described approximately in terms of gate equivalents where gate equivalent count specifies the average number of reference gates that are required to implement the particular function. The total power required for the particular function is estimated by multiplying the approximated number of gate equivalents with the average power consumed per gate. The reference gate can be any gate e.g. 2-input NAND gate.

Examples of Gate Equivalent technique

• Class-Independent Power Modeling: It is a technique which tries to estimate chip area, speed, and power dissipation based on information about the complexity of the design in terms of gate equivalents. The functionality is divided among different blocks but no distinction is made about the functionality of the blocks i.e. it is basically class independent. This is the technique used by the Chip Estimation System (CES).
Steps:
1. Identify the functional blocks such as counters, decoders, multipliers, memories, etc.
2. Assign a complexity in terms of Gate Equivalents. The number of GE’s for each unit type are either taken directly as an input from the user or are fed from a library.
${\displaystyle \displaystyle P=\sum _{i\in {\text{fns}}}{\textit {GE}}_{i}(E_{\text{typ}}+C_{L}^{i}V_{\text{dd}}^{2})fA_{\text{int}}^{i}}$
Where Etyp is the assumed average dissipated energy by a gate equivalent,when active. The activity factor, Aint, denotes the average percentage of gates switching per clock cycle and is allowed to vary from function to function. The capacitive load, CL, is a combination of fan-out loading as well as wiring. An estimate of the average wire length can be used to calculate the wiring capacitance. This is provided by the user and cross-checked by using a derivative of Rent’s Rule.
Assumptions:
1. A single reference gate is taken as the basis for all the power estimates not taking into consideration different circuit styles, clocking strategies, or layout techniques.
2. The percentage of gates switching per clock cycle denoted by Activity factors are assumed to be fixed regardless of the input patterns.
3. Typical gate switching energy is characterized by completely random uniform white noise (UWN) distribution of the input data. This implies that the power estimation is same regardless of the circuit being idle or at maximum load as this UWN model ignores how different input distributions affect the power consumption of gates and modules. [4]
• Class-Dependent Power Modeling: This approach is slightly better than the previous approach as it takes into account customized estimation techniques to the different types of functional blocks thus trying to increase the modelling accuracy which wasn’t the case in the previous technique such as logic, memory, interconnect, and clock hence the name. The power estimation is done in a very similar manner to the independent case. The basic switching energy is based on a three-input AND gate and is calculated from technology parameters e.g. gate width, tox, and metal width provided by the user.
${\displaystyle P_{\text{bitlines}}={\dfrac {N_{\text{col}}}{2}}\cdot (L_{\text{col}}C_{\text{wire}}+N_{\text{row}}C_{\text{cell}})V_{\text{dd}}V_{\text{swing}}}$
Where Cwire denotes the bit line wiring capacitance per unit length and Ccell denotes the loading due to a single cell hanging off the bit line. The clock capacitance is based on the assumption of an H-tree distribution network. Activity is modelled using a UWN model. As can be seen by the equation the power consumption of each components is related to the number of columns (Ncol) and rows (Nrow) in the memory array.
1. The circuit activities are not modeled accurately as an overall activity factor is assumed for the entire chip which is also not trustable as provided by the user. As a matter of fact activity factors will vary throughout the chip hence this is not very accurate and prone to error. This leads to the problem that even if the model gives a correct estimate for the total power consumption by the chip, the module wise power distribution is fairly inaccurate.
2. The chosen activity factor gives the correct total power, but the breakdown of power into logic, clock, memory, etc. is less accurate. Therefore this tool is not much different or improved in comparison with CES.

Precharacterized Cell Libraries

This technique further customizes the power estimation of various functional blocks by having separate power model for logic, memory, and interconnect suggesting a Power Factor Approximation (PFA) method for individually characterizing an entire library of functional blocks such as multipliers, adders, etc. instead of a single gate-equivalent model for “logic” blocks.
The power over the entire chip is approximated by the expression:

${\displaystyle \displaystyle P=\sum _{i\in {\text{all blocks}}}K_{i}G_{i}f_{i}}$

Where Ki is PFA proportionality constant that characterizes the ith functional element,Gi is the measure of hardware complexity, and fi denotes the activation frequency.

Example

Gi denoting the hardware complexity of the multiplier is related to the square of the input word length i.e. N2 where N is the word length. The activation frequency is the rate at which multiplies are performed by the algorithm denoted by fmult and the PFA constant, Kmult, is extracted empirically from past multiplier designs and shown to be about 15 fW/bit2-Hz for a 1.2 µm technology at 5V. The resulting power model for the multiplier on the basis of the above assumptions is:

${\displaystyle \displaystyle P_{\text{mult}}=K_{\text{mult}}N^{2}f_{\text{mult}}}$

• Customization is possible in terms of whatever complexity parameters which are appropriate for that block. E.g. for a multiplier the square of the word length was appropriate. For memory, the storage capacity in bits is used and for the I/O drivers the word length alone is adequate.

Weakness:

• There is the implicit assumption that the inputs do not affect the multiplier activity which is contradictory to the fact that the PFA constant Kmult is intended to capture the intrinsic internal activity associated with the multiply operation as it is taken to be a constant.

The estimation error (relative to switch-level simulation) for a 16x16 multiplier is experimented and it is observed that when the dynamic range of the inputs doesn’t fully occupy the word length of the multiplier, the UWN model becomes extremely inaccurate. [5] Granted, good designers attempt to maximize word length utilization. Still, errors in the range of 50-100% are not uncommon. The figure clearly suggests a flaw in the UWN model.

Related Research Articles

Processor design is the design engineering task of creating a processor, a key component of computer hardware. It is a subfield of computer engineering and electronics engineering (fabrication). The design process involves choosing an instruction set and a certain execution paradigm and results in a microarchitecture, which might be described in e.g. VHDL or Verilog. For microprocessor design, this description is then manufactured employing some of the various semiconductor device fabrication processes, resulting in a die which is bonded onto a chip carrier. This chip carrier is then soldered onto, or inserted into a socket on, a printed circuit board (PCB).

Very-large-scale integration (VLSI) is the process of creating an integrated circuit (IC) by combining millions of transistors or devices into a single chip. VLSI began in the 1970s when complex semiconductor and communication technologies were being developed. The microprocessor is a VLSI device. Before the introduction of VLSI technology, most ICs had a limited set of functions they could perform. An electronic circuit might consist of a CPU, ROM, RAM and other glue logic. VLSI lets IC designers add all of these into one chip.

Digital electronics, digital technology or digital (electronic) circuits are electronics that operate on digital signals. In contrast, analog circuits manipulate analog signals whose performance is more subject to manufacturing tolerance, signal attenuation and noise. Digital techniques are helpful because it is a lot easier to get an electronic device to switch into one of a number of known states than to accurately reproduce a continuous range of values.

A system on a chip or system on chip is an integrated circuit that integrates all components of a computer or other electronic system. These components typically include a central processing unit (CPU), memory, input/output ports and secondary storage – all on a single substrate or microchip, the size of a coin. It may contain digital, analog, mixed-signal, and often radio frequency signal processing functions, depending on the application. As they are integrated on a single substrate, SoCs consume much less power and take up much less area than multi-chip designs with equivalent functionality. Because of this, SoCs are very common in the mobile computing and edge computing markets. Systems on chip are commonly used in embedded systems and the Internet of Things.

In electronics and especially synchronous digital circuits, a clock signal is a particular type of signal that oscillates between a high and a low state and is used like a metronome to coordinate actions of digital circuits.

Formal equivalence checking process is a part of electronic design automation (EDA), commonly used during the development of digital integrated circuits, to formally prove that two representations of a circuit design exhibit exactly the same behavior.

In semiconductor design, standard cell methodology is a method of designing application-specific integrated circuits (ASICs) with mostly digital-logic features. Standard cell methodology is an example of design abstraction, whereby a low-level very-large-scale integration (VLSI) layout is encapsulated into an abstract logic representation. Cell-based methodology — the general class to which standard cells belong — makes it possible for one designer to focus on the high-level aspect of digital design, while another designer focuses on the implementation (physical) aspect. Along with semiconductor manufacturing advances, standard cell methodology has helped designers scale ASICs from comparatively simple single-function ICs, to complex multi-million gate system-on-a-chip (SoC) devices.

Clock gating is a popular technique used in many synchronous circuits for reducing dynamic power dissipation. Clock gating saves power by adding more logic to a circuit to prune the clock tree. Pruning the clock disables portions of the circuitry so that the flip-flops in them do not have to switch states. Switching states consumes power. When not being switched, the switching power consumption goes to zero, and only leakage currents are incurred.

Integrated circuit design, or IC design, is a subset of electronics engineering, encompassing the particular logic and circuit design techniques required to design integrated circuits, or ICs. ICs consist of miniaturized electronic components built into an electrical network on a monolithic semiconductor substrate by photolithography.

Logic simulation is the use of simulation software to predict the behavior of digital circuits and hardware description languages. Simulation can be performed at varying degrees of physical abstraction, such as at the transistor level, gate level, register-transfer level (RTL), electronic system-level (ESL), or behavioral level.

Power optimization is the use of electronic design automation tools to optimize (reduce) the power consumption of a digital design, such as that of an integrated circuit, while preserving the functionality.

In VLSI semiconductor manufacturing, the process of Design Closure is a part of the development workflow by which an integrated circuit design is modified from its initial description to meet a growing list of design constraints and objectives.

Retiming is the technique of moving the structural location of latches or registers in a digital circuit to improve its performance, area, and/or power characteristics in such a way that preserves its functional behavior at its outputs. Retiming was first described by Charles E. Leiserson and James B. Saxe in 1983.

In integrated circuit design, physical design is a step in the standard design cycle which follows after the circuit design. At this step, circuit representations of the components of the design are converted into geometric representations of shapes which, when manufactured in the corresponding layers of materials, will ensure the required functioning of the components. This geometric representation is called integrated circuit layout. This step is usually split into several sub-steps, which include both design and verification and validation of the layout.

High-level synthesis (HLS), sometimes referred to as C synthesis, electronic system-level (ESL) synthesis, algorithmic synthesis, or behavioral synthesis, is an automated design process that interprets an algorithmic description of a desired behavior and creates digital hardware that implements that behavior. Synthesis begins with a high-level specification of the problem, where behavior is generally decoupled from e.g. clock-level timing. Early HLS explored a variety of input specification languages., although recent research and commercial applications generally accept synthesizable subsets of ANSI C/C++/SystemC/MATLAB. The code is analyzed, architecturally constrained, and scheduled to transcompile into a register-transfer level (RTL) design in a hardware description language (HDL), which is in turn commonly synthesized to the gate level by the use of a logic synthesis tool. The goal of HLS is to let hardware designers efficiently build and verify hardware, by giving them better control over optimization of their design architecture, and through the nature of allowing the designer to describe the design at a higher level of abstraction while the tool does the RTL implementation. Verification of the RTL is an important part of the process.

References

1. Frank Vahid (2010). Digital Design with RTL Design, Verilog and VHDL (2nd ed.). John Wiley and Sons. p. 247. ISBN   978-0-470-53108-2.