Reliability (computer networking)

Last updated

In computer networking, a reliable protocol is a communication protocol that notifies the sender whether or not the delivery of data to intended recipients was successful. Reliability is a synonym for assurance, which is the term used by the ITU and ATM Forum.

Contents

Reliable protocols typically incur more overhead than unreliable protocols, and as a result, function more slowly and with less scalability. This often is not an issue for unicast protocols, but it may become a problem for reliable multicast protocols.

Transmission Control Protocol (TCP), the main protocol used on the Internet, is a reliable unicast protocol; it provides the abstraction of a reliable byte stream to applications. UDP is an unreliable protocol and is often used in computer games, streaming media or in other situations where speed is an issue and some data loss may be tolerated because of the transitory nature of the data.

Often, a reliable unicast protocol is also connection oriented. For example, TCP is connection oriented, with the virtual-circuit ID consisting of source and destination IP addresses and port numbers. However, some unreliable protocols are connection oriented, such as Asynchronous Transfer Mode and Frame Relay. In addition, some connectionless protocols, such as IEEE 802.11, are reliable.

History

Building on the packet switching concepts proposed by Donald Davies, the first communication protocol on the ARPANET was a reliable packet delivery procedure to connect its hosts via the 1822 interface. [1] [2] A host computer simply arranged the data in the correct packet format, inserted the address of the destination host computer, and sent the message across the interface to its connected Interface Message Processor (IMP). Once the message was delivered to the destination host, an acknowledgment was delivered to the sending host. If the network could not deliver the message, the IMP would send an error message back to the sending host.

Meanwhile, the developers of CYCLADES and of ALOHAnet demonstrated that it was possible to build an effective computer network without providing reliable packet transmission. This lesson was later embraced by the designers of Ethernet.

If a network does not guarantee packet delivery, then it becomes the host's responsibility to provide reliability by detecting and retransmitting lost packets. Subsequent experience on the ARPANET indicated that the network itself could not reliably detect all packet delivery failures, and this pushed responsibility for error detection onto the sending host in any case. This led to the development of the end-to-end principle, which is one of the Internet's fundamental design principles.

Reliability properties

A reliable service is one that notifies the user if delivery fails, while an unreliable one does not notify the user if delivery fails.[ citation needed ] For example, Internet Protocol (IP) provides an unreliable service. Together, Transmission Control Protocol (TCP) and IP provide a reliable service, whereas User Datagram Protocol (UDP) and IP provide an unreliable one.

In the context of distributed protocols, reliability properties specify the guarantees that the protocol provides with respect to the delivery of messages to the intended recipient(s).

An example of a reliability property for a unicast protocol is "at least once", i.e. at least one copy of the message is guaranteed to be delivered to the recipient.

Reliability properties for multicast protocols can be expressed on a per-recipient basis (simple reliability properties), or they may relate the fact of delivery or the order of delivery among the different recipients (strong reliability properties). In the context of multicast protocols, strong reliability properties express the guarantees that the protocol provides with respect to the delivery of messages to different recipients.

An example of a strong reliability property is last copy recall, meaning that as long as at least a single copy of a message remains available at any of the recipients, every other recipient that does not fail eventually also receives a copy. Strong reliability properties such as this one typically require that messages are retransmitted or forwarded among the recipients.

An example of a reliability property stronger than last copy recall is atomicity. The property states that if at least a single copy of a message has been delivered to a recipient, all other recipients will eventually receive a copy of the message. In other words, each message is always delivered to either all or none of the recipients.

One of the most complex strong reliability properties is virtual synchrony.

Reliable messaging is the concept of message passing across an unreliable infrastructure whilst being able to make certain guarantees about the successful transmission of the messages. [3] For example, that if the message is delivered, it is delivered at most once, or that all messages successfully delivered arrive in a particular order.

Reliable delivery can be contrasted with best-effort delivery, where there is no guarantee that messages will be delivered quickly, in order, or at all.

Implementations

A reliable delivery protocol can be built on an unreliable protocol. An extremely common example is the layering of Transmission Control Protocol on the Internet Protocol, a combination known as TCP/IP.

Strong reliability properties are offered by group communication systems (GCSs) such as IS-IS, Appia framework, JGroups or QuickSilver Scalable Multicast. The QuickSilver Properties Framework is a flexible platform that allows strong reliability properties to be expressed in a purely declarative manner, using a simple rule-based language, and automatically translated into a hierarchical protocol.

One protocol that implements reliable messaging is WS-ReliableMessaging, which handles reliable delivery of SOAP messages. [4]

The ATM Service-Specific Coordination Function provides for transparent assured delivery with AAL5. [5] [6] [7]

IEEE 802.11 attempts to provide reliable service for all traffic. The sending station will resend a frame if the sending station does not receive an ACK frame within a predetermined period of time.

Real-time systems

There is, however, a problem with the definition of reliability as "delivery or notification of failure" in real-time computing. In such systems, failure to deliver the real-time data will adversely affect the performance of the systems, and some systems, e.g. safety-critical, safety-involved, and some secure mission-critical systems, must be proved to perform at some specified minimum level. This, in turn, requires that a specified minimum reliability for the delivery of the critical data be met. Therefore, in these cases, it is only the delivery that matters; notification of the failure to deliver does ameliorate the failure. In hard real-time systems, all data must be delivered by the deadline or it is considered a system failure. In firm real-time systems, late data is still valueless but the system can tolerate some amount of late or missing data. [8] [9]

There are a number of protocols that are capable of addressing real-time requirements for reliable delivery and timeliness:

MIL-STD-1553B and STANAG 3910 are well-known examples of such timely and reliable protocols for avionic data buses. MIL-1553 uses a 1 Mbit/s shared media for the transmission of data and the control of these transmissions, and is widely used in federated military avionics systems. [10] It uses a bus controller (BC) to command the connected remote terminals (RTs) to receive or transmit this data. The BC can, therefore, ensure that there will be no congestion, and transfers are always timely. The MIL-1553 protocol also allows for automatic retries that can still ensure timely delivery and increase the reliability above that of the physical layer. STANAG 3910, also known as EFABus in its use on the Eurofighter Typhoon, is, in effect, a version of MIL-1553 augmented with a 20 Mbit/s shared media bus for data transfers, retaining the 1 Mbit/s shared media bus for control purposes.

The Asynchronous Transfer Mode (ATM), the Avionics Full-Duplex Switched Ethernet (AFDX), and Time Triggered Ethernet (TTEthernet) are examples of packet-switched networks protocols where the timeliness and reliability of data transfers can be assured by the network. AFDX and TTEthernet are also based on IEEE 802.3 Ethernet, though not entirely compatible with it.

ATM uses connection-oriented virtual channels (VCs) which have fully deterministic paths through the network, and usage and network parameter control (UPC/NPC), which are implemented within the network, to limit the traffic on each VC separately. This allows the usage of the shared resources (switch buffers) in the network to be calculated from the parameters of the traffic to be carried in advance, i.e. at system design time. That they are implemented by the network means that these calculations remain valid even when other users of the network behave in unexpected ways, i.e. transmit more data than they are expected to. The calculated usages can then be compared with the capacities of these resources to show that, given the constraints on the routes and the bandwidths of these connections, the resource used for these transfers will never be over-subscribed. These transfers will therefore never be affected by congestion and there will be no losses due to this effect. Then, from the predicted maximum usages of the switch buffers, the maximum delay through the network can also be predicted. However, for the reliability and timeliness to be proved, and for the proofs to be tolerant of faults in and malicious actions by the equipment connected to the network, the calculations of these resource usages cannot be based on any parameters that are not actively enforced by the network, i.e. they cannot be based on what the sources of the traffic are expected to do or on statistical analyses of the traffic characteristics (see network calculus). [11]

AFDX uses frequency domain bandwidth allocation and traffic policing, that allows the traffic on each virtual link to be limited so that the requirements for shared resources can be predicted and congestion prevented so it can be proved not to affect the critical data. [12] However, the techniques for predicting the resource requirements and proving that congestion is prevented are not part of the AFDX standard.

TTEthernet provides the lowest possible latency in transferring data across the network by using time-domain control methods – each time triggered transfer is scheduled at a specific time so that contention for shared resources is controlled and thus the possibility of congestion is eliminated. The switches in the network enforce this timing to provide tolerance of faults in, and malicious actions on the part of, the other connected equipment. However, "synchronized local clocks are the fundamental prerequisite for time-triggered communication". [13] This is because the sources of critical data will have to have the same view of time as the switch, in order that they can transmit at the correct time and the switch will see this as correct. This also requires that the sequence with which a critical transfer is scheduled has to be predictable to both source and switch. This, in turn, will limit the transmission schedule to a highly deterministic one, e.g. the cyclic executive.

However, low latency in transferring data over the bus or network does not necessarily translate into low transport delays between the application processes that source and sink this data. This is especially true where the transfers over the bus or network are cyclically scheduled (as is commonly the case with MIL-STD-1553B and STANAG 3910, and necessarily so with AFDX and TTEthernet) but the application processes are not synchronized with this schedule.

With both AFDX and TTEthernet, there are additional functions required of the interfaces, e.g. AFDX's Bandwidth Allocation Gap control, and TTEthernet's requirement for very close synchronization of the sources of time-triggered data, that make it difficult to use standard Ethernet interfaces. Other methods for control of the traffic in the network that would allow the use of such standard IEEE 802.3 network interfaces is a subject of current research. [14]

Related Research Articles

<span class="mw-page-title-main">Multicast</span> Computer networking technique

In computer networking, multicast is group communication where data transmission is addressed to a group of destination computers simultaneously. Multicast can be one-to-many or many-to-many distribution. Multicast should not be confused with physical layer point-to-multipoint communication.

In computer networking, the User Datagram Protocol (UDP) is one of the core communication protocols of the Internet protocol suite used to send messages to other hosts on an Internet Protocol (IP) network. Within an IP network, UDP does not require prior communication to set up communication channels or data paths.

<span class="mw-page-title-main">Frame Relay</span> Wide area network technology

Frame Relay is a standardized wide area network (WAN) technology that specifies the physical and data link layers of digital telecommunications channels using a packet switching methodology. Originally designed for transport across Integrated Services Digital Network (ISDN) infrastructure, it may be used today in the context of many other network interfaces.

A virtual circuit (VC) is a means of transporting data over a data network, based on packet switching and in which a connection is first established across the network between two endpoints. The network, rather than having a fixed data rate reservation per connection as in circuit switching, takes advantage of the statistical multiplexing on its transmission links, an intrinsic feature of packet switching.

<span class="mw-page-title-main">Transport layer</span> Layer in the OSI and TCP/IP models providing host-to-host communication services for applications

In computer networking, the transport layer is a conceptual division of methods in the layered architecture of protocols in the network stack in the Internet protocol suite and the OSI model. The protocols of this layer provide end-to-end communication services for applications. It provides services such as connection-oriented communication, reliability, flow control, and multiplexing.

The data link layer, or layer 2, is the second layer of the seven-layer OSI model of computer networking. This layer is the protocol layer that transfers data between nodes on a network segment across the physical layer. The data link layer provides the functional and procedural means to transfer data between network entities and may also provide the means to detect and possibly correct errors that can occur in the physical layer.

A vehicle bus is a specialized internal communications network that interconnects components inside a vehicle. In electronics, a bus is simply a device that connects multiple electrical or electronic devices together. Special requirements for vehicle control such as assurance of message delivery, of non-conflicting messages, of minimum time of delivery, of low cost, and of EMF noise resilience, as well as redundant routing and other characteristics mandate the use of less common networking protocols. Protocols include Controller Area Network (CAN), Local Interconnect Network (LIN) and others. Conventional computer networking technologies are rarely used, except in aircraft, where implementations of the ARINC 664 such as the Avionics Full-Duplex Switched Ethernet are used. Aircraft that use AFDX include the B787, the A400M and the A380. Trains commonly use Ethernet Consist Network (ECN). All cars sold in the United States since 1996 are required to have an On-Board Diagnostics connector, for access to the car's electronic controllers.

In telecommunications and computer networking, connection-oriented communication is a communication protocol where a communication session or a semi-permanent connection is established before any useful data can be transferred. The established connection ensures that data is delivered in the correct order to the upper communication layer. The alternative is called connectionless communication, such as the datagram mode communication used by Internet Protocol (IP) and User Datagram Protocol, where data may be delivered out of order, since different network packets are routed independently and may be delivered over different paths.

IP multicast is a method of sending Internet Protocol (IP) datagrams to a group of interested receivers in a single transmission. It is the IP-specific form of multicast and is used for streaming media and other network applications. It uses specially reserved multicast address blocks in IPv4 and IPv6.

<span class="mw-page-title-main">Broadcasting (networking)</span> Network messaging to multiple recipients simultaneously

In computer networking, telecommunication and information theory, broadcasting is a method of transferring a message to all recipients simultaneously. Broadcasting can be performed as a high-level operation in a program, for example, broadcasting in Message Passing Interface, or it may be a low-level networking operation, for example broadcasting on Ethernet.

Avionics Full-Duplex Switched Ethernet (AFDX), also ARINC 664, is a data network, patented by international aircraft manufacturer Airbus, for safety-critical applications that utilizes dedicated bandwidth while providing deterministic quality of service (QoS). AFDX is a worldwide registered trademark by Airbus. The AFDX data network is based on Ethernet technology using commercial off-the-shelf (COTS) components. The AFDX data network is a specific implementation of ARINC Specification 664 Part 7, a profiled version of an IEEE 802.3 network per parts 1 & 2, which defines how commercial off-the-shelf networking components will be used for future generation Aircraft Data Networks (ADN). The six primary aspects of an AFDX data network include full duplex, redundancy, determinism, high speed performance, switched and profiled network.

Transparent Inter Process Communication (TIPC) is an Inter-process communication (IPC) service in Linux designed for cluster-wide operation. It is sometimes presented as Cluster Domain Sockets, in contrast to the well-known Unix Domain Socket service; the latter working only on a single kernel.

Pragmatic General Multicast (PGM) is a reliable multicast computer network transport protocol. PGM provides a reliable sequence of packets to multiple recipients simultaneously, making it suitable for applications like multi-receiver file-transfer.

CobraNet is a combination of software, hardware, and network protocols designed to deliver uncompressed, multi-channel, low-latency digital audio over a standard Ethernet network. Developed in the 1990s, CobraNet is widely regarded as the first commercially successful audio-over-Ethernet implementation.

A forwarding information base (FIB), also known as a forwarding table or MAC table, is most commonly used in network bridging, routing, and similar functions to find the proper output network interface controller to which the input interface should forward a packet. It is a dynamic table that maps MAC addresses to ports. It is the essential mechanism that separates network switches from Ethernet hubs. Content-addressable memory (CAM) is typically used to efficiently implement the FIB, thus it is sometimes called a CAM table.

The Time-Triggered Ethernet standard defines a fault-tolerant synchronization strategy for building and maintaining synchronized time in Ethernet networks, and outlines mechanisms required for synchronous time-triggered packet switching for critical integrated applications and integrated modular avionics (IMA) architectures. SAE International released SAE AS6802 in November 2011.

A reliable multicast is any computer networking protocol that provides a reliable sequence of packets to multiple recipients simultaneously, making it suitable for applications such as multi-receiver file transfer.

Data center bridging (DCB) is a set of enhancements to the Ethernet local area network communication protocol for use in data center environments, in particular for use with clustering and storage area networks.

<span class="mw-page-title-main">Broadcast, unknown-unicast and multicast traffic</span> Computer networking concept

Broadcast, unknown-unicast and multicast traffic is network traffic transmitted using one of three methods of sending data link layer network traffic to a destination of which the sender does not know the network address. This is achieved by sending the network traffic to multiple destinations on an Ethernet network. As a concept related to computer networking, it includes three types of Ethernet modes: broadcast, unicast and multicast Ethernet. BUM traffic refers to that kind of network traffic that will be forwarded to multiple destinations or that cannot be addressed to the intended destination only.

Deterministic Networking (DetNet) is an effort by the IETF DetNet Working Group to study implementation of deterministic data paths for real-time applications with extremely low data loss rates, packet delay variation (jitter), and bounded latency, such as audio and video streaming, industrial automation, and vehicle control.

References

  1. Gillies, J.; Cailliau, R. (2000). How the Web was Born: The Story of the World Wide Web. Oxford University Press. pp. 23–25. ISBN   0192862073.
  2. Roberts, Dr. Lawrence G. (November 1978). "The Evolution of Packet Switching" (PDF). IEEE Invited Paper. Retrieved September 10, 2017. In nearly all respects, Davies' original proposal, developed in late 1965, was similar to the actual networks being built today.
  3. W3C paper on reliable messaging
  4. WS-ReliableMessaging specification (PDF)
  5. Young-ki Hwang, et al., Service Specific Coordination Function for Transparent Assured Delivery with AAL5 (SSCF-TADAS), Military Communications Conference Proceedings, 1999. MILCOM 1999, vol.2, pages 878–882. doi : 10.1109/MILCOM.1999.821329
  6. ATM Forum, The User Network Interface (UNI), v. 3.1, ISBN   0-13-393828-X, Prentice Hall PTR, 1995.
  7. ITU-T, B-ISDN ATM Adaptation Layer specification: Type 5 AAL, Recommendation I.363.5, International Telecommunication Union, 1998.
  8. S., Schneider, G.,Pardo-Castellote, M., Hamilton. “Can Ethernet Be Real Time?”, Real-Time Innovations, Inc., 2001
  9. Dan Rubenstein, Jim Kurose, Don Towsley, ”Real-Time Reliable Multicast Using Proactive Forward Error Correction”, NOSSDAV ’98
  10. Mats Ekman, Avionic Architectures Trends and challenges (PDF), archived from the original (PDF) on 2015-02-03, Each system has its own computers performing its own functions
  11. Kim, Y. J.; Chang, S. C.; Un, C. K.; Shin, B. C. (March 1996). "UPC/NPC algorithm for guaranteed QoS in ATM networks". Computer Communications. 19 (3). Amsterdam, the Netherlands: Elsevier Science Publishers: 216–225. doi:10.1016/0140-3664(96)01063-8.
  12. AFDX Tutorial, "Archived copy" (PDF). Archived from the original (PDF) on 2015-06-18. Retrieved 2015-02-03.{{cite web}}: CS1 maint: archived copy as title (link)
  13. Wilfried Steiner and Bruno Dutertre, SMT-Based Formal Verification of a TTEthernet Synchronization Function, S. Kowalewski and M. Roveri (Eds.), FMICS 2010, LNCS 6371, pp. 148–163, 2010.
  14. D. W. Charlton; et al. (2013), "An Avionic Gigabit Ethernet Network", Avionics, Fiber-Optics and Photonics Conference (AVFOP), IEEE, pp. 17–18, doi:10.1109/AVFOP.2013.6661601, ISBN   978-1-4244-7348-9, S2CID   3162009