Sorcerer's Apprentice syndrome

Last updated January 28, 2025

Sorcerer's Apprentice syndrome (SAS) is a network protocol flaw in the original versions of TFTP. It was named after Goethe's 1797 poem "Der Zauberlehrling" (popularized in the US by the "Sorcerer's Apprentice" segment of the 1940 animated film Fantasia ), because the details of its operation closely resemble the disaster that befalls the sorcerer's apprentice: the problem resulted in an ever-growing replication of every packet in the transfer.

Technical background

TFTP operates in simple lock-step: there is only ever one packet outstanding at any time, and every packet received by either party caused one packet to be sent in reply (until the termination of the transfer). The TFTP specification said that any time any packet was received, the receiver was required to send the appropriate reply packet. Thus, the receipt of a block of data triggered the sending of an acknowledgement, and the receipt of an acknowledgement triggered the sending of the next data block.

TFTP also, like all protocols designed to operate across an unreliable network, includes timeouts. After sending a packet, it expects a reply, so it starts a timer. If the timer expires with no reply received, it takes some action; typically re-sending the original packet.

Details

SAS occurred when a packet was not lost in the internetwork, but rather simply delayed, and later successfully delivered, after a timeout had occurred (on either side).

The timeout causes a second copy of the previous packet to be sent to replace the "lost" packet. However, the first copy was not lost, and since, according to the TFTP specification, receipt of any packet always forced the generation of a reply packet, two replies were generated (one to each copy). Those forced the generation of two replies to them, and so on. A typical scenario was as follows:

Computer S (source) sends data block X to computer D (destination)
Computer D receives block X, and sends an acknowledgement for X back to S
The packet containing the acknowledgement for X is delayed in the internetwork
Computer S times out, and resends data block X to D
Computer S receives the delayed acknowledgement for X, and sends data block X+1
Computer D receives the second copy of block X, and sends another acknowledgement for X back to S
Computer D receives block X+1, and sends an acknowledgement for X+1 back to S
Computer S receives the second acknowledgement for X, and sends a second copy of data block X+1
Computer S receives the acknowledgement for X+1, and sends data block X+2
Computer D receives the second copy of block X+1, and sends another acknowledgement for X+1 back to S
Computer D receives block X+2, and sends an acknowledgement for X+2 back to S

It will be seen that at this point the situation is now stable, and repeats; every packet from then on is duplicated (that is, two identical copies are sent across the internetwork).

Even worse, the increased number of packets being sent around the internetwork was likely to cause congestion, which was likely to cause a packet to be delayed past the timeout yet again, which would then cause yet another duplicate packet to be generated by a timeout, and from then on a third copy of each packet would be sent. Needless to say, at that point, the situation would usually snowball, and further copies would be generated — hence the name given to this pattern of behaviour.

For a small file, the transfer would complete, and the duplicate packets would eventually drain from the internetwork. If the file were large, however, congestive collapse would result, and only when the transfer failed would the mass of packets drain from the internetwork.

Solution

The fix to SAS involved modifying the TFTP specification to break the loop.^[1] Only the first instance of a received acknowledgment should cause the next data block to be sent; further copies of the acknowledgment for a particular data block would be ignored, thus breaking the retransmission loop. In the new version of the protocol, a block would only be retransmitted on timeout.

This change also makes it possible to simplify the implementation of the receiving end (often, a bootstrap program written in a low-level language) by omitting the retransmission timer, as any lost packet would cause retransmission of the last packet sent by the sender. However, keeping the timer has its benefits, such as dealing with lost ACKs more efficiently.

Related Research Articles

The Transmission Control Protocol (TCP) is one of the main protocols of the Internet protocol suite. It originated in the initial network implementation in which it complemented the Internet Protocol (IP). Therefore, the entire suite is commonly referred to as TCP/IP. TCP provides reliable, ordered, and error-checked delivery of a stream of octets (bytes) between applications running on hosts communicating via an IP network. Major internet applications such as the World Wide Web, email, remote administration, and file transfer rely on TCP, which is part of the transport layer of the TCP/IP suite. SSL/TLS often runs on top of TCP.

In computer networking, the User Datagram Protocol (UDP) is one of the core communication protocols of the Internet protocol suite used to send messages to other hosts on an Internet Protocol (IP) network. Within an IP network, UDP does not require prior communication to set up communication channels or data paths.

Automatic repeat request (ARQ), also known as automatic repeat query, is an error-control method for data transmission that uses acknowledgements and timeouts to achieve reliable data transmission over an unreliable communication channel. ARQ is appropriate if the communication channel has varying or unknown capacity. If the sender does not receive an acknowledgment before the timeout, it re-transmits the message until it receives an acknowledgment or exceeds a predefined number of retransmissions.

Trivial File Transfer Protocol (TFTP) is a simple lockstep File Transfer Protocol which allows a client to get a file from or put a file onto a remote host. One of its primary uses is in the early stages of nodes booting from a local area network. TFTP has been used for this application because it is very simple to implement.

Network congestion in data networking and queueing theory is the reduced quality of service that occurs when a network node or link is carrying more data than it can handle. Typical effects include queueing delay, packet loss or the blocking of new connections. A consequence of congestion is that an incremental increase in offered load leads either only to a small increase or even a decrease in network throughput.

Stop-and-wait ARQ, also referred to as alternating bit protocol, is a method in telecommunications to send information between two connected devices. It ensures that information is not lost due to dropped packets and that packets are received in the correct order. It is the simplest automatic repeat-request (ARQ) mechanism. A stop-and-wait ARQ sender sends one frame at a time; it is a special case of the general sliding window protocol with transmit and receive window sizes equal to one in both cases. After sending each frame, the sender doesn't send any further frames until it receives an acknowledgement (ACK) signal. After receiving a valid frame, the receiver sends an ACK. If the ACK does not reach the sender before a certain time, known as the timeout, the sender sends the same frame again. The timeout countdown is reset after each frame transmission. The above behavior is a basic example of Stop-and-Wait. However, real-life implementations vary to address certain issues of design.

A keepalive (KA) is a message sent by one device to another to check that the link between the two is operating, or to prevent the link from being broken.

In data communications, flow control is the process of managing the rate of data transmission between two nodes to prevent a fast sender from overwhelming a slow receiver. Flow control should be distinguished from congestion control, which is used for controlling the flow of data when congestion has actually occurred. Flow control mechanisms can be classified by whether or not the receiving node sends feedback to the sending node.

Transmission Control Protocol (TCP) uses a congestion control algorithm that includes various aspects of an additive increase/multiplicative decrease (AIMD) scheme, along with other schemes including slow start and a congestion window (CWND), to achieve congestion avoidance. The TCP congestion-avoidance algorithm is the primary basis for congestion control in the Internet. Per the end-to-end principle, congestion control is largely a function of internet hosts, not the network itself. There are several variations and versions of the algorithm implemented in protocol stacks of operating systems of computers that connect to the Internet.

Selective Repeat ARQ or Selective Reject ARQ is a specific instance of the automatic repeat request (ARQ) protocol used to manage sequence numbers and retransmissions in reliable communications.

Nagle's algorithm is a means of improving the efficiency of TCP/IP networks by reducing the number of packets that need to be sent over the network. It was defined by John Nagle while working for Ford Aerospace. It was published in 1984 as a Request for Comments (RFC) with title Congestion Control in IP/TCP Internetworks in RFC 896.

TCP tuning techniques adjust the network congestion avoidance parameters of Transmission Control Protocol (TCP) connections over high-bandwidth, high-latency networks. Well-tuned networks can perform up to 10 times faster in some cases. However, blindly following instructions without understanding their real consequences can hurt performance as well.

Retransmission, essentially identical with automatic repeat request (ARQ), is the resending of packets which have been either damaged or lost. Retransmission is one of the basic mechanisms used by protocols operating over a packet switched computer network to provide reliable communication.

Packet loss occurs when one or more packets of data travelling across a computer network fail to reach their destination. Packet loss is either caused by errors in data transmission, typically across wireless networks, or network congestion. Packet loss is measured as a percentage of packets lost with respect to packets sent.

A sliding window protocol is a feature of packet-based data transmission protocols. Sliding window protocols are used where reliable in-order delivery of packets is required, such as in the data link layer as well as in the Transmission Control Protocol. They are also used to improve efficiency when the channel may include high latency.

Extremely Opportunistic Routing (ExOR) is a combination of routing protocol and media access control for a wireless ad hoc network, invented by Sanjit Biswas and Robert Morris of the MIT Artificial Intelligence Laboratory, and described in a 2005 paper. A very similar opportunistic routing scheme was also independently proposed by Zhenzhen Ye and Yingbo Hua from University of California, Riverside and presented in a paper in 2005. Previously open source, ExOR was available in 2005 but is no longer obtainable. The broadcast and retransmission strategies used by the algorithm were already described in the literature. ExOR is valuable because it can operate available digital radios to use some previously impractical algorithmic optimizations.

A reliable byte stream is a common service paradigm in computer networking; it refers to a byte stream in which the bytes which emerge from the communication channel at the recipient are exactly the same, and in exactly the same order, as they were when the sender inserted them into the channel.

Karn's algorithm addresses the problem of getting accurate estimates of the round-trip time for messages when using the Transmission Control Protocol (TCP) in computer networking. The algorithm, also sometimes termed as the Karn-Partridge algorithm was proposed in a paper by Phil Karn and Craig Partridge in 1987.

In data networking, telecommunications, and computer buses, an acknowledgement (ACK) is a signal that is passed between communicating processes, computers, or devices to signify acknowledgment, or receipt of message, as part of a communications protocol. Correspondingly a negative-acknowledgement is a signal that is sent to reject a previously received message or to indicate some kind of error. Acknowledgments and negative acknowledgments inform a sender of the receiver's state so that it can adjust its own state accordingly.

NACK-Oriented Reliable Multicast (NORM) is a transport layer Internet protocol designed to provide reliable transport in multicast groups in data networks. It is formally defined by the Internet Engineering Task Force (IETF) in Request for Comments (RFC) 5740, which was published in November 2009.

References

↑ Braden, Robert, ed. (October 1989). "Sorcerer's Apprentice Syndrome". Requirements for Internet Hosts -- Application and Support (rfc). IETF. pp. 43–45. sec. 4.2.3.1. doi: 10.17487/RFC1123 . RFC 1123 . Retrieved 2012-10-05.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[rfc1123-1] Braden, Robert, ed. (October 1989). "Sorcerer's Apprentice Syndrome". Requirements for Internet Hosts -- Application and Support (rfc). IETF. pp. 43–45. sec. 4.2.3.1. doi: 10.17487/RFC1123 . RFC 1123 . Retrieved 2012-10-05.

[1]