Bus encoding

Last updated April 17, 2024

Bus encoding refers to converting/encoding a piece of data to another form before launching on the bus. While bus encoding can be used to serve various purposes like reducing the number of pins, compressing the data to be transmitted, reducing cross-talk between bit lines, etc., it is one of the popular techniques used in system design to reduce dynamic power consumed by the system bus.^[1]^[2] Bus encoding aims to reduce the Hamming distance between 2 consecutive values on the bus. Since the activity is directly proportional to the Hamming distance, bus encoding proves to be effective in reducing the overall activity factor thereby reducing the dynamic power consumption in the system.

Motivation

Power consumption in electronic systems is a matter of concern today for the below reasons:

Battery-operated devices: Due to ubiquity of battery operated devices and the need to maximize the duration between two subsequent charging of the battery, it is necessary that the system consumes as less power (and energy) as possible.
Environmental constraints: In an attempt to protect the environment, we need to conserve the usable energy. Since the energy consumed by electronic systems is increasing drastically, minimizing the energy consumption of electronic systems is critical to save the environment.
Power dissipation: As per the Moore's law, semiconductor devices have been packing more and more transistors in smaller amount of area. This leads to higher power dissipation per unit area and makes packaging and thermal cooling system design complex and costly. Hence, low power electronic systems are needed to tackle this issue.

The dynamic power dissipated by an electronic circuit is directly proportional to the activity factor and the load capacitance as seen by the output of the logic gate. In case of a bus, the load capacitance is usually high since bus needs to be connected to multiple modules and routed longer and the activity factor is also high. Due to higher value of load capacitance and activity factor, in a typical system, bus power consumption can contribute up to 50% of the total power consumption. Bus encoding aims to reduce this power by reducing the amount of activity (number of toggles) in the bus lines. While the kind of bus encoding to be used for a particular system can be best determined when the target application and environmental constraints about the system are known apriori, described below are some bus encoding techniques which can help reduce bus power for most systems.

Hence bus encoding is important for any electronic system design.^{[ citation needed ]}

Examples of bus encoding to achieve low power

Following are some of the implementations to use bus encoding for reducing dynamic power consumption in different scenarios:

Gray code addressing :^[3] The address lines of a bus in most of the computing systems increase in consecutive numerical values due to spatial locality. If we use regular binary coding for the bus, we are not assured of minimal Hamming distance between 2 consecutive addresses. Using Gray codes for encoding the address lines will lead to a Hamming distance of 1 between any 2 consecutive address bus values (as long as spatial locality holds). There are variations to this scheme named Shifted Gray encoding to reduce the delay overhead.^[4]
Sequential addressing or T0 codes:^[5] In case of address bus, due to spatial locality that exists in programs, most of the transitions involve changing the address to the next consecutive value. A possible encoding scheme is to use an additional line, INC, in the bus indicating whether the current transition is the next increment address or not. If it is not a consecutive address, then the receiver can use the value on the bus. But if it is a consecutive address, the transmitter need not change the value in the bus, but just assert the INC line to 1. In such case, for a continuous addressing scheme, there is no transition at all on the bus, leading to a bus activity factor of 0.
Number representation: Consider an example of a system which gets one of its data from a sensor. Most of the times, the sensor may be measuring some noise and for this example, consider that the values being measured are (0) and (-1) alternatively. For a 32-bit data bus, value 0 translates to 0x00000000 (0000 0000 0000 0000 0000 0000 0000 0000) while (-1) translates to 0xFFFFFFFF (1111 1111 1111 1111 1111 1111 1111 1111) in a 2’s complement representation. We see that the Hamming distance in this case is 32 (since all 32-bits are changing their state). Instead, if we encode the bus to use signed integer representation (MSB is sign bit), we can represent 0 as 0x00000000 (0000 0000 0000 0000 0000 0000 0000 0000) and -1 as 0x80000001 (1000 0000 0000 0000 0000 0000 0000 0001) . In this case, we see that the Hamming distance between the numbers is just 2. Hence by using a 2’s complement to signed arithmetic encoding, we are able to reduce the activity from a factor of 32 to 2.
Inversion encoding :^[6]^[7] This is another implementation of bus encoding where an additional line named INV is added to the bus lines. Depending on the value of the INV line, the other lines will be used with or without inversion. e.g. if INV line is 0, the data on the bus is sampled as it is but if INV line is 1, the data on the bus is inverted before any processing on it. Referring to the example used in 3, instead of using a signed integer representation, we could continue using 2’s complement and achieve the same activity reduction using inversion encoding. So, 0 will be represented as 0x00000000 with INV=0 and -1 will be represented as 0x00000000 with INV=1. Since INV=1, receiver will invert the data before consuming it, thereby converting it to 0xFFFFFFFF internally. In this case, only 1 bit (INV bit) is changed over bus leading to an activity of factor 1. In general, in inversion encoding, the encoder computes the Hamming distance between the current value and next value and based on that, determines whether to use INV=0 or INV=1.
Value cache encoding :^[8] This is another form of Bus encoding, primarily used for external (off-chip) Busses. A dictionary (value cache) is maintained at both the sender and receiver end about some of the commonly shared data patterns. Instead of passing the data patterns each time, the sender just toggles one bit indicating which entry from value cache to be used at the receiver end. Only for values which are not present in the value cache, the complete data is sent over the bus. There has been various modified implementations of this technique with an intent to maximize the hits for the value cache, but the underlying idea is the same.^[9]^[10]
Other techniques like sector-based encoding,^[11] variations of inversion coding, have also been proposed. There has been work on using bus encodings which lower the leakage power consumption as well along with reducing the crosstalk with minimal impact on path delays.^[12]^[13]

Other examples of bus encoding

Many other types of bus encoding have been developed for a variety of reasons:

improved EMC: differential signaling used in many buses, and the more general constant-weight code used in the MIPI C-PHY Camera Serial Interface ^[14] are both more immune to outside interference, and emit less interference to other devices.
bus multiplexing: Many early microprocessors and many early DRAM chips reduced costs by using bus multiplexing, rather than dedicate a pin to every address bit and data bit of the system bus. One approach re-uses the address bus pins at different times for data bus pins,^[15] an approach used by conventional PCI. Another approach re-uses the same pins at different times for the upper half and for the lower half of the address bus, an approach used by many dynamic random-access memory chips, adding 2 pins to the control bus -- a row-address strobe (RAS) and the column-address strobe (CAS).

Implementation method

In case of SoC designs, bus encoding schemes can be best implemented in RTL by instantiating dedicated encoders and decoders over the bus. Another way it could be implemented is by passing hint to the synthesis tool either as a trace of the simulation^[16] or by using synthesis pragma to define the type of encoding needed.

On board, a small low power IC can be deployed in between the master and slave modules on the bus to implement the encoding and decoding functions.

Properties of the encoding function

The bus encoding/decoding function must be a bijection. This essentially requires encoding function to possess below behavior:^[3]

Every data to be launched on the bus must have a unique encoded value and every encoded value must uniquely decode to the same original value.
It must be possible to encode and decode all the values which can be generated by the source.

Trade-off / analysis

While adding of bus encoding reduces the activity factor over the bus and leads to reduction in dynamic power, addition of encoders and decoders around the bus causes additional circuitry to be added to the design, which also consume certain amount of dynamic power. We must factor this while computing the power savings.
The additional circuitry will also increase the leakage power of the design/circuit/system/SoC. If the base activity factor of the system bus is not very high, bus encoding may not be a very viable option since it will degrade overall energy consumption due to higher leakage power.
If the bus timing is in the critical data path, adding of additional circuitry in the path will degrade the timing path and may prove detrimental. This analysis needs to be done carefully to determine what kind of bus encoding to use.

Related Research Articles

A complex instruction set computer is a computer architecture in which single instructions can execute several low-level operations or are capable of multi-step operations or addressing modes within single instructions. The term was retroactively coined in contrast to reduced instruction set computer (RISC) and has therefore become something of an umbrella term for everything that is not RISC, where the typical differentiating characteristic is that most RISC designs use uniform instruction length for almost all instructions, and employ strictly separate load and store instructions.

Static random-access memory is a type of random-access memory (RAM) that uses latching circuitry (flip-flop) to store each bit. SRAM is volatile memory; data is lost when power is removed.

The Pentium Pro is a sixth-generation x86 microprocessor developed and manufactured by Intel and introduced on November 1, 1995. It introduced the P6 microarchitecture and was originally intended to replace the original Pentium in a full range of applications. Later, it was reduced to a more narrow role as a server and high-end desktop processor. The Pentium Pro was also used in supercomputers, most notably ASCI Red, which used two Pentium Pro CPUs on each computing nodes and was the first computer to reach over one teraFLOPS in 1996, holding the number one spot in the TOP500 list from 1997 to 2000.

<span class="mw-page-title-main">Processor power dissipation</span> Production of waste heat by computer processors

Processor power dissipation or processing unit power dissipation is the process in which computer processors consume electrical energy, and dissipate this energy in the form of heat due to the resistance in the electronic circuits.

Bus snooping or bus sniffing is a scheme by which a coherency controller (snooper) in a cache monitors or snoops the bus transactions, and its goal is to maintain a cache coherency in distributed shared memory systems. This scheme was introduced by Ravishankar and Goodman in 1983, under the name "write-once" cache coherency. A cache containing a coherency controller (snooper) is called a snoopy cache.

The Lempel–Ziv–Markov chain algorithm (LZMA) is an algorithm used to perform lossless data compression. It has been under development since either 1996 or 1998 by Igor Pavlov and was first used in the 7z format of the 7-Zip archiver. This algorithm uses a dictionary compression scheme somewhat similar to the LZ77 algorithm published by Abraham Lempel and Jacob Ziv in 1977 and features a high compression ratio and a variable compression-dictionary size, while still maintaining decompression speed similar to other commonly used compression algorithms.

A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels, with different instruction-specific and data-specific caches at level 1. The cache memory is typically implemented with static random-access memory (SRAM), in modern CPUs by far the largest part of them by chip area, but SRAM is not always used for all levels, or even any level, sometimes some latter or all levels are implemented with eDRAM.

In electronics, computer science and computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as µarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular processor. A given ISA may be implemented with different microarchitectures; implementations may vary due to different goals of a given design or due to shifts in technology.

The RISC Single Chip, or RSC, is a single-chip microprocessor developed and fabricated by International Business Machines (IBM). The RSC was a feature-reduced single-chip implementation of the POWER1, a multi-chip central processing unit (CPU) which implemented the POWER instruction set architecture (ISA). It was used in entry-level workstation models of the IBM RS/6000 family, such as the Model 220 and 230.

The P6 microarchitecture is the sixth-generation Intel x86 microarchitecture, implemented by the Pentium Pro microprocessor that was introduced in November 1995. It is frequently referred to as i686. It was planned to be succeeded by the NetBurst microarchitecture used by the Pentium 4 in 2000, but was revived for the Pentium M line of microprocessors. The successor to the Pentium M variant of the P6 microarchitecture is the Core microarchitecture which in turn is also derived from P6.

The R10000, code-named "T5", is a RISC microprocessor implementation of the MIPS IV instruction set architecture (ISA) developed by MIPS Technologies, Inc. (MTI), then a division of Silicon Graphics, Inc. (SGI). The chief designers are Chris Rowen and Kenneth C. Yeager. The R10000 microarchitecture is known as ANDES, an abbreviation for Architecture with Non-sequential Dynamic Execution Scheduling. The R10000 largely replaces the R8000 in the high-end and the R4400 elsewhere. MTI was a fabless semiconductor company; the R10000 was fabricated by NEC and Toshiba. Previous fabricators of MIPS microprocessors such as Integrated Device Technology (IDT) and three others did not fabricate the R10000 as it was more expensive to do so than the R4000 and R4400.

Error correction code memory is a type of computer data storage that uses an error correction code (ECC) to detect and correct n-bit data corruption which occurs in memory.

The Alpha 21064 is a microprocessor developed and fabricated by Digital Equipment Corporation that implemented the Alpha instruction set architecture (ISA). It was introduced as the DECchip 21064 before it was renamed in 1994. The 21064 is also known by its code name, EV4. It was announced in February 1992 with volume availability in September 1992. The 21064 was the first commercial implementation of the Alpha ISA, and the first microprocessor from Digital to be available commercially. It was succeeded by a derivative, the Alpha 21064A in October 1993. This last version was replaced by the Alpha 21164 in 1995.

The Alpha 21164, also known by its code name, EV5, is a microprocessor developed and fabricated by Digital Equipment Corporation that implemented the Alpha instruction set architecture (ISA). It was introduced in January 1995, succeeding the Alpha 21064A as Digital's flagship microprocessor. It was succeeded by the Alpha 21264 in 1998.

<span class="mw-page-title-main">Maxwell (microarchitecture)</span> GPU microarchitecture by Nvidia

Maxwell is the codename for a GPU microarchitecture developed by Nvidia as the successor to the Kepler microarchitecture. The Maxwell architecture was introduced in later models of the GeForce 700 series and is also used in the GeForce 800M series, GeForce 900 series, and Quadro Mxxx series, as well as some Jetson products.

Double Data Rate 5 Synchronous Dynamic Random-Access Memory is a type of synchronous dynamic random-access memory. Compared to its predecessor DDR4 SDRAM, DDR5 was planned to reduce power consumption, while doubling bandwidth. The standard, originally targeted for 2018, was released on July 14, 2020.

State encoding assigns a unique pattern of ones and zeros to each defined state of a finite-state machine (FSM). Traditionally, design criteria for FSM synthesis were speed, area or both. Following Moore's law, with technology advancement, density and speed of integrated circuits have increased exponentially. With this, power dissipation per area has inevitably increased, which has forced designers for portable computing devices and high-speed processors to consider power dissipation as a critical parameter during design consideration.

Power consumption is becoming increasingly important for both embedded, mobile computing and high-performance systems. Off-chip data bus consumes a significant part of system power. It is observed that the off-chip data bus consumes between 9.8% and 23.2% of the total power consumed by the system depending on the system. So, reducing the power consumption of the off-chip data bus would reduce the overall power consumption.

Power consumption in relation to physical size of electronic hardware has increased as the components have become smaller and more densely packed. Coupled with high operating frequencies, this has led to unacceptable levels of power dissipation. Memory accounts for a high proportion of the power consumed, and this contribution may be reduced by optimizing data organization – the way data is stored.

Inversion encoding is an encoding technique used for encoding bus transmissions for low power systems. It is based on the fact that a large amount of power is wasted because of transitions, especially in external buses, and thus reducing these transitions aids optimization of power dissipation. This is done introducing an additional signal line named INV to the bus lines. This signal determines whether the other lines should be inverted or not.

References

↑ Pedram, Massoud; Abdollahi, A., Low Power RT-Level Synthesis Techniques: A Tutorial (PDF)
↑ Devadas; Malik (1995), "A Survey of Optimization Techniques targeting Low Power VLSI Circuits", DAC 32: 242–247
1 2 Cheng, Wei-Chung; Pedram, Massoud, Memory Bus Encoding for Low Power: A Tutorial (PDF)
↑ Guo, Hui; Parameswaran, Sri (April–June 2010). "Shifted Gray encoding to reduce instruction memory address bus switching for low-power embedded systems". Journal of Systems Architecture. 56 (4–6): 180–190. doi:10.1016/j.sysarc.2010.03.003.
↑ Benini, Luca; De Micheli, Giovanni; Macii, Enrico; Sciuto, D.; Silvano, C. (March 1997). "Asymptotic Zero-Transition Activity Encoding for Address Buses in Low-Power Microprocessor-Based Systems". Proceedings Seventh Great Lakes Symposium on VLSI: 77–82.
↑ Stan, Mircea R.; Burleson, Wayne P. (March 1995). "Bus-Invert Coding for Low-Power I/O". IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 3 (1): 49–58. CiteSeerX 10.1.1.89.2154 . doi:10.1109/92.365453. 1063-8210/95$04.00.
↑ http://www.eng.auburn.edu/~agrawvd/COURSE/E6270_Fall07/PROJECT/JIANG/Low%20power%2032-bit%20bus%20with%20inversion%20encoding.ppt.{{cite web}}: Missing or empty |title= (help)
↑ Yang, J.; et al. (August 2001). "FV encoding for low power data I/O". Islped 2001: 84–87.
↑ Basu; et al. (2002). "Power protocol: reducing power dissipation on off-chip data buses". Micro.
↑ Lin, C.-H.; et al. (2006). "Hierarchical Value Cache Encoding for Off-Chip Data Bus". ISLPED.
↑ Aghaghiri, Yazdan; Fallah, Farzan; Pedram, Massoud. "Transition Reduction in Memory Buses Using Sector-based Encoding Techniques" (PDF).
↑ Deogun, H.; Rao, R.; Sylvester, D.; Blaauw, D. (June 2004). "Leakage- and crosstalk-aware bus encoding for total power reduction". Proceedings of the 41st Design Automation Conference: 779–782.
↑ Khan, Z.; Arslan, T.; Erdogan, A. (January 2005). "A novel bus encoding scheme from energy and crosstalk efficiency perspective for AMBA based generic SoC systems". Proceedings of the 18th International Conference on VLSI Design: 751–756.
↑ "Demystifying MIPI C-PHY / DPHY Subsystem - Tradeoffs, Challenges, and Adoption" (mirror)
↑ Don Lancaster. "TV Typewriter Cookbook". (TV Typewriter). Section "Bus Organization". p. 82.
↑ Benini, Luca; De Micheli, Giovanni; Macii, Enrico; Poncino, Massimo; Quer, Stefano (December 1998). "Power Optimization of Core-Based Systems by Address Bus Encoding" (PDF). IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 6 (4).