8b/10b encoding

Last updated

Fibre Channel
Layer 4. Protocol mapping
LUN masking
Layer 3. Common services
Layer 2. Network
Fibre Channel fabric
Fibre Channel zoning
Registered state change notification
Layer 1. Data link
Fibre Channel 8b/10b encoding
Layer 0. Physical

In telecommunications, 8b/10b is a line code that maps 8-bit words to 10-bit symbols to achieve DC balance and bounded disparity, and at the same time provide enough state changes to allow reasonable clock recovery. This means that the difference between the counts of ones and zeros in a string of at least 20 bits is no more than two, and that there are not more than five ones or zeros in a row. This helps to reduce the demand for the lower bandwidth limit of the channel necessary to transfer the signal. [1]

Contents

An 8b/10b code can be implemented in various ways with focus on different performance parameters. One implementation was designed by K. Odaka for the DAT digital audio recorder. [2] Kees Schouhamer Immink designed an 8b/10b code for the DCC audio recorder. [3] The IBM implementation was described in 1983 by Al Widmer and Peter Franaszek. [4] [5]

IBM implementation

As the scheme name suggests, eight bits of data are transmitted as a 10-bit entity called a symbol, or character. The low five bits of data are encoded into a 6-bit group (the 5b/6b portion) and the top three bits are encoded into a 4-bit group (the 3b/4b portion). These code groups are concatenated together to form the 10-bit symbol that is transmitted on the wire. The data symbols are often referred to as D.x.y where x ranges over 031 and y over 07. Standards using the 8b/10b encoding also define up to 12 special symbols (or control characters) that can be sent in place of a data symbol. They are often used to indicate start-of-frame, end-of-frame, link idle, skip and similar link-level conditions. At least one of them (i.e. a "comma" symbol) needs to be used to define the alignment of the 10-bit symbols. They are referred to as K.x.y and have different encodings from any of the D.x.y symbols.

Because 8b/10b encoding uses 10-bit symbols to encode 8-bit words, some of the possible 1024 (10 bit, 210) symbols can be excluded to grant a run-length limit of 5 consecutive equal bits and to ensure the difference between the count of zeros and ones to be no more than two. Some of the 256 possible 8-bit words can be encoded in two different ways. Using these alternative encodings, the scheme is able to achieve long-term DC-balance in the serial data stream. This permits the data stream to be transmitted through a channel with a high-pass characteristic, for example Ethernet's transformer-coupled unshielded twisted pair or optical receivers using automatic gain control.

Encoding tables and byte encoding

Note that in the following tables, for each input byte (represented as HGF EDCBA), A denotes the least significant bit (LSB), and H the most significant (MSB). The output gains two extra bits, i and j. The bits are sent from LSB to MSB: a, b, c, d, e, i,  f, g, h, and j; i.e., the 5b/6b code followed by the 3b/4b code. This ensures the uniqueness of the special bit sequence in the comma symbols.

The residual effect on the stream to the number of zero and one bits transmitted is maintained as the running disparity (RD) and the effect of slew is balanced by the choice of encoding for following symbols.

The 5b/6b code is a paired disparity code, and so is the 3b/4b code. Each 6- or 4-bit code word has either equal numbers of zeros and ones (a disparity of zero), or comes in a pair of forms, one with two more zeros than ones (four zeros and two ones, or three zeros and one one, respectively) and one with two less. When a 6- or 4-bit code is used that has a non-zero disparity (count of ones minus count of zeros; i.e., −2 or +2), the choice of positive or negative disparity encodings must be the one that toggles the running disparity. In other words, the non zero disparity codes alternate.

Running disparity

8b/10b coding is DC-free, meaning that the long-term ratio of ones and zeros transmitted is exactly 50%. To achieve this, the difference between the number of ones transmitted and the number of zeros transmitted is always limited to ±2, and at the end of each symbol, it is either +1 or −1. This difference is known as the running disparity (RD).

This scheme needs only two states for the running disparity of +1 and −1. It starts at −1. [6]

For each 5b/6b and 3b/4b code with an unequal number of ones and zeros, there are two bit patterns that can be used to transmit it: one with two more "1" bits, and one with all bits inverted and thus two more zeros. Depending on the current running disparity of the signal, the encoding engine selects which of the two possible six- or four-bit sequences to send for the given data. Obviously, if the six-bit or four-bit code has equal numbers of ones and zeros, there is no choice to make, as the disparity would be unchanged, with the exceptions of sub-blocks D.07 (00111) and D.x.3 (011). In either case the disparity is still unchanged, but if RD is positive when D.07 is encountered 000111 is used, and if it is negative 111000 is used. Likewise, if RD is positive when D.x.3 is encountered 0011 is used, and if it is negative 1100 is used. This is accurately reflected in the charts below, but is worth making additional mention of as these are the only two sub-blocks with equal numbers of 1s and 0s that each have two possible encodings.

Rules for running disparity
previous
RD
Disparity of
code word
Disparity
chosen
next
RD
−100−1
−1±2+2+1
+100+1
+1±2−2−1

5b/6b code (abcdei)

InputRD = −1RD = +1InputRD = −1RD = +1
CodeEDCBAa b c d e iCodeEDCBAa b c d e i
D.0000000100111011000D.1610000011011100100
D.0100001011101100010D.1710001100011
D.0200010101101010010D.1810010010011
D.0300011110001D.1910011110010
D.0400100110101001010D.2010100001011
D.0500101101001D.2110101101010
D.0600110011001D.2210110011010
D.0700111111000000111D.23 †10111111010000101also used for the K.23.7 symbol
D.0801000111001000110D.2411000110011001100
D.0901001100101D.2511001100110
D.1001010010101D.2611010010110
D.1101011110100D.27 †11011110110001001also used for the K.27.7 symbol
D.1201100001101D.2811100001110
D.1301101101100D.29 †11101101110010001also used for the K.29.7 symbol
D.1401110011100D.30 †11110011110100001also used for the K.30.7 symbol
D.1501111010111101000D.3111111101011010100
not used111100000011K.28 ‡11100001111110000exclusively used for K.28.x symbols

† also used for the 5b/6b code of K.x.7

‡ exclusively used for the 5b/6b code of K.28.y

3b/4b code (fghj)

InputRD = −1RD = +1InputRD = −1RD = +1
CodeHGFf g h jCodeHGFf g h j
D.x.000010110100K.x.000010110100
D.x.10011001K.x.1 ‡00101101001
D.x.20100101K.x.201010100101
D.x.301111000011K.x.301111000011
D.x.410011010010K.x.410011010010
D.x.51011010K.x.5 ‡10101011010
D.x.61100110K.x.611010010110
D.x.P7 †11111100001K.x.7 ‡11101111000
D.x.A7 †01111000

† For D.x.7, either the Primary (D.x.P7), or the Alternate (D.x.A7) encoding must be selected in order to avoid a run of five consecutive 0s or 1s when combined with the preceding 5b/6b code.
Sequences of exactly five identical bits are used in comma symbols for synchronization issues.
D.x.A7 is used only

  • when RD = −1: for x = 17, 18 and 20 and
  • when RD = +1: for x = 11, 13 and 14.

With x = 23, x = 27, x = 29, and x = 30, the 3b/4b code portion used for control symbols K.x.7 is the same as that for D.x.A7.
Any other D.x.A7 code can't be used as it would result in chances for misaligned comma sequences.

‡ Only K.28.1, K.28.5, and K.28.7 generate comma symbols, that contain a bit sequence of five 0s or 1s.
The symbol has the format 110000 01xx or 001111 10xx.

Control symbols

The control symbols within 8b/10b are 10b symbols that are valid sequences of bits (no more than six 1s or 0s) but do not have a corresponding 8b data byte. They are used for low-level control functions. For instance, in Fibre Channel, K28.5 is used at the beginning of four-byte sequences (called "Ordered Sets") that perform functions such as Loop Arbitration, Fill Words, Link Resets, etc.

Resulting from the 5b/6b and 3b/4b tables the following 12 control symbols are allowed to be sent:

Control symbols
InputRD = −1RD = +1
SymbolDECHEXHGF EDCBAabcdei fghjabcdei fghj
K.28.0281C000 11100001111 0100110000 1011
K.28.1 †603C001 11100001111 1001110000 0110
K.28.2 925C010 11100001111 0101110000 1010
K.28.3 1247C011 11100001111 0011110000 1100
K.28.4 1569C100 11100001111 0010110000 1101
K.28.5 †188BC101 11100001111 1010110000 0101
K.28.6 220DC110 11100001111 0110110000 1001
K.28.7 ‡252FC111 11100001111 1000110000 0111
K.23.7 247F7111 10111111010 1000000101 0111
K.27.7 251FB111 11011110110 1000001001 0111
K.29.7 253FD111 11101101110 1000010001 0111
K.30.7 254FE111 11110011110 1000100001 0111

† Within the control symbols, K.28.1, K.28.5, and K.28.7 are "comma symbols". Comma symbols are used for synchronization (finding the alignment of the 8b/10b codes within a bit-stream). If K.28.7 is not used, the unique comma sequences 00111110 or 11000001 cannot inadvertently appear at any bit position within any combination of normal codes.

‡ If K.28.7 is allowed in the actual coding, a more complex definition of the synchronization pattern than suggested by † needs to be used, as a combination of K.28.7 with several other codes forms a false misaligned comma symbol overlapping the two codes. A sequence of multiple K.28.7 codes is not allowable in any case, as this would result in undetectable misaligned comma symbols.

K.28.7 is the only comma symbol that cannot be the result of a single bit error in the data stream.

Example encoding of D31.1

D31.1 for both running disparity cases
InputRD = −1RD = +1
CodeDECHEXHGF EDCBAabcdei fghjabcdei fghj
D31.1633F001 11111101011 1001010100 1001

Technologies that use 8b/10b

After the above-mentioned IBM patent expired, the scheme became even more popular and was chosen as a DC-free line code for several communication technologies.

Among the areas in which 8b/10b encoding finds application are the following:

Fibre Channel (4GFC and 8GFC variants only)

The FC-0 standard defines what encoding scheme is to be used (8b/10b or 64b/66b) in a Fibre Channel system [8]   higher speed variants typically use 64b/66b to optimize bandwidth efficiency (since bandwidth overhead is 20% in 8b/10b versus approximately 3% (~ 2/66) in 64b/66b systems). Thus, 8b/10b encoding is used for 4GFC and 8GFC variants; for 10GFC and 16GFC variants, it is 64b/66b. [9] The Fibre Channel FC1 data link layer is then responsible for implementing the 8b/10b encoding and decoding of signals.

The Fibre Channel 8b/10b coding scheme is also used in other telecommunications systems. Data is expanded using an algorithm that creates one of two possible 10-bit output values for each input 8-bit value. Each 8-bit input value can map either to a 10-bit output value with odd disparity, or to one with even disparity. This mapping is usually done at the time when parallel input data is converted into a serial output stream for transmission over a fibre channel link. The odd/even selection is done in such a way that a long-term zero disparity between ones and zeroes is maintained. This is often called "DC balancing".

The 8-bit to 10-bit conversion scheme uses only 512 of the possible 1024 output values. Of the remaining 512 unused output values, most contain either too many ones (or too many zeroes) and therefore are not allowed. This still leaves enough spare 10-bit odd+even coding pairs to allow for at least 12 special non-data characters.

The codes that represent the 256 data values are called the data (D) codes. The codes that represent the 12 special non-data characters are called the control (K) codes.

All of the codes can be described by stating 3 octal values. This is done with a naming convention of "Dxx.x" or "Kxx.x".

Example:

Input Data Bits: ABCDEFGH
Data is split: ABC DEFGH
Data is shuffled: DEFGH ABC

Now these bits are converted to decimal in the way they are paired.

Input data

C3 (HEX) = 11000011          = 110 00011          = 00011 110          =   3    6

E 8B/10B = D03.6

Digital audio

Encoding schemes 8b/10b have found a heavy use in digital audio storage applications, namely

A differing but related scheme is used for audio CDs and CD-ROMs:

Alternatives

Note that 8b/10b is the encoding scheme, not a specific code. While many applications do use the same code, there exist some incompatible implementations; for example, Transition Minimized Differential Signaling, which also expands 8 bits to 10 bits, but it uses a completely different method to do so.

64b/66b encoding, introduced for 10 Gigabit Ethernet's 10GBASE-R Physical Medium Dependent (PMD) interfaces, is a lower-overhead alternative to 8b/10b encoding, having a two-bit overhead per 64 bits (instead of eight bits) of encoded data. This scheme is considerably different in design from 8b/10b encoding, and does not explicitly guarantee DC balance, short run length, and transition density (these features are achieved statistically via scrambling). 64b/66b encoding has been extended to the 128b/130b and 128b/132b encoding variants for PCI Express 3.0 and USB 3.1, respectively, replacing the 8b/10b encoding in earlier revisions of each standard. [10]

Related Research Articles

<span class="mw-page-title-main">Line code</span> Pattern used within a communications system to represent digital data

In telecommunication, a line code is a pattern of voltage, current, or photons used to represent digital data transmitted down a communication channel or written to a storage medium. This repertoire of signals is usually called a constrained code in data storage systems. Some signals are more prone to error than others as the physics of the communication channel or storage medium constrains the repertoire of signals that can be used reliably.

In telecommunication and data storage, Manchester code is a line code in which the encoding of each data bit is either low then high, or high then low, for equal time. It is a self-clocking signal with no DC component. Consequently, electrical connections using a Manchester code are easily galvanically isolated.

In telecommunication, a paired disparity code is a line code in which at least one of the data characters is represented by two codewords of opposite disparity that are used in sequence so as to minimize the total disparity of a longer sequence of digits.

Golomb coding is a lossless data compression method using a family of data compression codes invented by Solomon W. Golomb in the 1960s. Alphabets following a geometric distribution will have a Golomb code as an optimal prefix code, making Golomb coding highly suitable for situations in which the occurrence of small values in the input stream is significantly more likely than large values.

Run-length limited or RLL coding is a line coding technique that is used to send arbitrary data over a communications channel with bandwidth limits. RLL codes are defined by four main parameters: m, n, d, k. The first two, m/n, refer to the rate of the code, while the remaining two specify the minimal d and maximal k number of zeroes between consecutive ones. This is used in both telecommunication and storage systems that move a medium past a fixed recording head.

<span class="mw-page-title-main">Transition-minimized differential signaling</span> Digital serial communication standard

Transition-minimized differential signaling (TMDS), a technology for transmitting high-speed serial data, is used by the DVI and HDMI video interfaces, as well as by other digital communication interfaces.

10 Gigabit Attachment Unit Interface is a standard for extending the XGMII between the MAC and PHY layer of 10 Gigabit Ethernet (10GbE) defined in Clause 47 of the IEEE 802.3 standard. The name is a concatenation of the Roman numeral X, meaning ten, and the initials of "Attachment Unit Interface".

<span class="mw-page-title-main">Bipolar encoding</span>

In telecommunication, bipolar encoding is a type of return-to-zero (RZ) line code, where two nonzero values are used, so that the three values are +, −, and zero. Such a signal is called a duobinary signal. Standard bipolar encodings are designed to be DC-balanced, spending equal amounts of time in the + and − states.

In telecommunication, 4B5B is a form of data communications line code. 4B5B maps groups of 4 bits of data onto groups of 5 bits for transmission. These 5-bit words are pre-determined in a dictionary and they are chosen to ensure that there will be sufficient transitions in the line state to produce a self-clocking signal. A collateral effect of the code is that 25% more bits are needed to send the same information.

4B3T, which stands for 4 (four) binary 3 (three) ternary, is a line encoding scheme used for ISDN PRI interface. 4B3T represents four binary bits using three pulses.

Generic Framing Procedure (GFP) is a multiplexing technique defined by ITU-T G.7041. This allows mapping of variable length, higher-layer client signals over a circuit switched transport network like OTN, SDH/SONET or PDH. The client signals can be protocol data unit (PDU) oriented or can be block-code oriented.

The physical coding sublayer (PCS) is a networking protocol sublayer in the Fast Ethernet, Gigabit Ethernet, and 10 Gigabit Ethernet standards. It resides at the top of the physical layer (PHY), and provides an interface between the Physical Medium Attachment (PMA) sublayer and the media-independent interface (MII). It is responsible for data encoding and decoding, scrambling and descrambling, alignment marker insertion and removal, block and symbol redistribution, and lane block synchronization and deskew.

In telecommunications, 6b/8b is a line code that expands 6-bit codes to 8-bit symbols for the purposes of maintaining DC-balance in a communications system.

A Serializer/Deserializer (SerDes) is a pair of functional blocks commonly used in high speed communications to compensate for limited input/output. These blocks convert data between serial data and parallel interfaces in each direction. The term "SerDes" generically refers to interfaces used in various technologies and applications. The primary use of a SerDes is to provide data transmission over a single line or a differential pair in order to minimize the number of I/O pins and interconnects.

Communication between devices in a fibre channel network uses different elements of Fibre Channel standards.

In serial communication of digital data, clock recovery is the process of extracting timing information from a serial data stream itself, allowing the timing of the data in the stream to be accurately determined without separate clock information. It is widely used in data communications; the similar concept used in analog systems like color television is known as carrier recovery.

In data networking and transmission, 64b/66b is a line code that transforms 64-bit data to 66-bit line code to provide enough state changes to allow reasonable clock recovery and alignment of the data stream at the receiver. It was defined by the IEEE 802.3 working group as part of the IEEE 802.3ae-2002 amendment which introduced 10 Gbit/s Ethernet. At the time 64b/66b was deployed, it allowed 10 Gb Ethernet to be transmitted with the same lasers used by SONET OC-192, rather than requiring the 12.5 Gbit/s lasers that were not expected to be available for several years.

ARINC 818: Avionics Digital Video Bus (ADVB) is a video interface and protocol standard developed for high bandwidth, low-latency, uncompressed digital video transmission in avionics systems. The standard, which was released in January 2007, has been advanced by ARINC and the aerospace community to meet the stringent needs of high performance digital video. The specification was updated and ARINC 818-2 was released in December 2013, adding a number of new features, including link rates up to 32X fibre channel rates, channel-bonding, switching, field sequential color, bi-directional control and data-only links.

Synchronous transmit-receive (STR) was an early IBM character-oriented communications protocol which preceded Bisync. STR was point-to-point only, and employed a four-of-eight transmission code, communicating at up to 5100 characters per second over half-duplex or full-duplex communication lines.

References

  1. Kees Schouhamer Immink (March 1997). "Performance Assessment of DC-Free Multimode Codes". IEEE Transactions on Communications. 45 (3): 293–299. doi:10.1109/26.558690. The dc-balanced or dc-free codes, as they are often called, have a long history and their application is certainly not confined to recording practice.
  2. US 4,456,905,"Method and apparatus for encoding binary data",published 1984-06-26
  3. US 4,620,311,"Method of transmitting information, encoding device for use in the method, and decoding device for use in the method",published 1986-10-28
  4. Al X. Widmer, Peter A. Franaszek (1983). "A DC-Balanced, Partitioned-Block, 8B/10B Transmission Code". IBM Journal of Research and Development. 27 (5): 440–451. doi:10.1147/rd.275.0440.
  5. US 4,486,739,"Byte oriented DC balanced (0,4) 8B/10B partitioned block transmission code",published 1984-12-04
  6. Thatcher, Jonathan (April 1, 1996). "Thoughts on Gigabit Ethernet Physical". IBM. Retrieved August 17, 2008.
  7. "Physical Layer Specifications". Mipi.org. MIPI Alliance. Retrieved April 20, 2014.
  8. Fibre Channel Framing and Signaling - 3 (FC-FS-3) Rev 1.1 Sections 5.2.1 and 5.3.1
  9. FIBRE CHANNEL Physical Interface-5 (FC-PI-5) REV 6.10 Section 5.7
  10. Mahesh Wagh (August 6, 2011). "PCIe 3.0 Encoding & PHY Logical" (PDF). pcisig.com. Retrieved June 5, 2015.