Byzantine fault

Last updated

A Byzantine fault (also Byzantine generals problem, interactive consistency, source congruency, error avalanche, Byzantine agreement problem, and Byzantine failure [1] ) is a condition of a computer system, particularly distributed computing systems, where components may fail and there is imperfect information on whether a component has failed. The term takes its name from an allegory, the "Byzantine generals problem", [2] developed to describe a situation in which, to avoid catastrophic failure of the system, the system's actors must agree on a concerted strategy, but some of these actors are unreliable.

Contents

In a Byzantine fault, a component such as a server can inconsistently appear both failed and functioning to failure-detection systems, presenting different symptoms to different observers. It is difficult for the other components to declare it failed and shut it out of the network, because they need to first reach a consensus regarding which component has failed in the first place. Byzantine fault tolerance (BFT) is the resilience of a fault-tolerant computer system to such conditions.

Definition

A Byzantine fault is any fault presenting different symptoms to different observers. [3] A Byzantine failure is the loss of a system service due to a Byzantine fault in systems that require consensus among distributed nodes. [4]

If all generals attack in coordination, the battle is won (left). If two generals falsely declare that they intend to attack, but instead retreat, the battle is lost (right). Byzantine Generals.png
If all generals attack in coordination, the battle is won (left). If two generals falsely declare that they intend to attack, but instead retreat, the battle is lost (right).

As an analogy of the fault's simplest form, consider a number of generals who are attacking a fortress. The generals must decide as a group whether to attack or retreat; some may prefer to attack, while others prefer to retreat. The important thing is that all generals agree on a common decision, for a halfhearted attack by a few generals would become a rout, and would be worse than either a coordinated attack or a coordinated retreat.

The problem is complicated by the presence of treacherous generals who may not only cast a vote for a suboptimal strategy; they may do so selectively. For instance, if nine generals are voting, four of whom support attacking while four others are in favor of retreat, the ninth general may send a vote of retreat to those generals in favor of retreat, and a vote of attack to the rest. Those who received a retreat vote from the ninth general will retreat, while the rest will attack (which may not go well for the attackers). The problem is complicated further by the generals being physically separated and having to send their votes via messengers who may fail to deliver votes or may forge false votes.

Byzantine fault tolerance can be achieved if the loyal (non-faulty) generals have a majority agreement on their strategy. There can be a default vote value given to missing messages. For example, missing messages can be given a "null" value. Further, if the agreement is that the null votes are in the majority, a pre-assigned default strategy can be used (e.g. retreat). [5]

The typical mapping of this story onto computer systems is that the computers are the generals and their digital communication system links are the messengers. Although the problem is formulated in the analogy as a decision-making and security problem, in electronics, it cannot be solved by cryptographic digital signatures alone, because failures such as incorrect voltages can propagate through the encryption process. Thus, a component may appear functioning to one component and faulty to another, which prevents forming a consensus as to whether the component is faulty or not. [3]

Formally defined:

Given a system of n components, t of which are dishonest, and assuming only point-to-point channels between all the components. [6]

Whenever a component A tries to broadcast a value x, the other components are allowed to discuss with each other and verify the consistency of A's broadcast, and eventually settle on a common value y.

The system is said to resist Byzantine faults if a component A can broadcast a value x, and then:

  1. If A is honest, then all honest components agree on the value x.
  2. In any case, all honest components agree on the same value y.

The problem has been studied in the case of both synchronous and asynchronous communications.

The communication graph above is assumed to be the complete graph (i.e. each component can discuss with every other), but the communication graph can be restricted.

It can also be relaxed in a more "realistic" problem where the faulty components do not collude together in an attempt to lure the others into error. It is in this setting that practical algorithms have been devised.

History

The problem of obtaining Byzantine consensus was conceived and formalized by Robert Shostak, who dubbed it the interactive consistency problem. This work was done in 1978 in the context of the NASA-sponsored SIFT [7] project in the Computer Science Lab at SRI International. SIFT (for Software Implemented Fault Tolerance) was the brainchild of John Wensley, and was based on the idea of using multiple general-purpose computers that would communicate through pairwise messaging in order to reach a consensus, even if some of the computers were faulty.

At the beginning of the project, it was not clear how many computers in total were needed to guarantee that a conspiracy of n faulty computers could not "thwart" the efforts of the correctly-operating ones to reach consensus. Shostak showed that a minimum of 3n+1 are needed, and devised a two-round 3n+1 messaging protocol that would work for n=1. His colleague Marshall Pease generalized the algorithm for any n > 0, proving that 3n+1 is both necessary and sufficient. These results, together with a later proof by Leslie Lamport of the sufficiency of 3n using digital signatures, were published in the seminal paper, Reaching Agreement in the Presence of Faults. [8] The authors were awarded the 2005 Edsger W. Dijkstra Prize for this paper.

To make the interactive consistency problem easier to understand, Lamport devised a colorful allegory in which a group of army generals formulate a plan for attacking a city. In its original version, the story cast the generals as commanders of the Albanian army. The name was changed, eventually settling on "Byzantine", at the suggestion of Jack Goldberg to future-proof any potential offense-giving. [9] This formulation of the problem, together with some additional results, were presented by the same authors in their 1982 paper, "The Byzantine Generals Problem". [5]

Mitigation

The objective of Byzantine fault tolerance is to be able to defend against failures of system components with or without symptoms that prevent other components of the system from reaching an agreement among themselves, where such an agreement is needed for the correct operation of the system.

The remaining operationally correct components of a Byzantine fault tolerant system will be able to continue providing the system's service as originally intended, assuming there are a sufficient number of accurately-operating components to maintain the service.

Byzantine failures are considered the most general and most difficult class of failures among the failure modes. The so-called fail-stop failure mode occupies the simplest end of the spectrum. Whereas fail-stop failure mode simply means that the only way to fail is a node crash, detected by other nodes, Byzantine failures imply no restrictions, which means that the failed node can generate arbitrary data, including data that makes it appear like a functioning node. Thus, Byzantine failures can confuse failure detection systems, which makes fault tolerance difficult. Despite the analogy, a Byzantine failure is not necessarily a security problem involving hostile human interference: it can arise purely from electrical or software faults.

The terms fault and failure are used here according to the standard definitions [10] originally created by a joint committee on "Fundamental Concepts and Terminology" formed by the IEEE Computer Society's Technical Committee on Dependable Computing and Fault-Tolerance and IFIP Working Group 10.4 on Dependable Computing and Fault Tolerance. [11] See also dependability.

Byzantine fault tolerance is only concerned with broadcast consistency, that is, the property that when one component broadcasts one value to all the other components, they all receive exactly this same value, or in the case that the broadcaster is not consistent, the other components agree on a common value themselves. This kind of fault tolerance does not encompass the correctness of the value itself; for example, an adversarial component that deliberately sends an incorrect value, but sends that same value consistently to all components, will not be caught in the Byzantine fault tolerance scheme.

Solutions

Several early solutions were described by Lamport, Shostak, and Pease in 1982. [5] They began by noting that the Generals' Problem can be reduced to solving a "Commander and Lieutenants" problem where loyal Lieutenants must all act in unison and that their action must correspond to what the Commander ordered in the case that the Commander is loyal:

Several system architectures were designed c. 1980 that implemented Byzantine fault tolerance. These include: Draper's FTMP, [14] Honeywell's MMFCS, [15] and SRI's SIFT. [7]

In 1999, Miguel Castro and Barbara Liskov introduced the "Practical Byzantine Fault Tolerance" (PBFT) algorithm, [16] which provides high-performance Byzantine state machine replication, processing thousands of requests per second with sub-millisecond increases in latency.

After PBFT, several BFT protocols were introduced to improve its robustness and performance. For instance, Q/U, [17] HQ, [18] Zyzzyva, [19] and ABsTRACTs, [20] addressed the performance and cost issues; whereas other protocols, like Aardvark [21] and RBFT, [22] addressed its robustness issues. Furthermore, Adapt [23] tried to make use of existing BFT protocols, through switching between them in an adaptive way, to improve system robustness and performance as the underlying conditions change. Furthermore, BFT protocols were introduced that leverage trusted components to reduce the number of replicas, e.g., A2M-PBFT-EA [24] and MinBFT. [25]

Applications

Several examples of Byzantine failures that have occurred are given in two equivalent journal papers. [3] [4] These and other examples are described on the NASA DASHlink web pages. [26]

Applications in computing

Byzantine fault tolerance mechanisms use components that repeat an incoming message (or just its signature) to other recipients of that incoming message. All these mechanisms make the assumption that the act of repeating a message blocks the propagation of Byzantine symptoms. For systems that have a high degree of safety or security criticality, these assumptions must be proven to be true to an acceptable level of fault coverage. When providing proof through testing, one difficulty is creating a sufficiently wide range of signals with Byzantine symptoms. [27] Such testing will likely require specialized fault injectors. [28] [29]

Military applications

Byzantine errors were observed infrequently and at irregular points during endurance testing for the newly constructed Virginia class submarines, at least through 2005 (when the issues were publicly reported). [30]

Cryptocurrency applications

The Bitcoin network works in parallel to generate a blockchain with proof-of-work allowing the system to overcome Byzantine failures and reach a coherent global view of the system's state. [31] [32] Some proof of stake blockchains also use BFT algorithms. [33]

Applications in aviation

Some aircraft systems, such as the Boeing 777 Aircraft Information Management System (via its ARINC 659 SAFEbus network), the Boeing 777 flight control system, and the Boeing 787 flight control systems, use Byzantine fault tolerance; because these are real-time systems, their Byzantine fault tolerance solutions must have very low latency. For example, SAFEbus can achieve Byzantine fault tolerance within the order of a microsecond of added latency. [34] [35] [36] The SpaceX Dragon considers Byzantine fault tolerance in its design. [37]

See also

Related Research Articles

<span class="mw-page-title-main">Leslie Lamport</span> American computer scientist and mathematician

Leslie B. Lamport is an American computer scientist and mathematician. Lamport is best known for his seminal work in distributed systems, and as the initial developer of the document preparation system LaTeX and the author of its first manual.

Checkpointing is a technique that provides fault tolerance for computing systems. It basically consists of saving a snapshot of the application's state, so that applications can restart from that point in case of failure. This is particularly important for long running applications that are executed in failure-prone computing systems.

In systems engineering, dependability is a measure of a system's availability, reliability, maintainability, and in some cases, other characteristics such as durability, safety and security. In real-time computing, dependability is the ability to provide services that can be trusted within a time-period. The service guarantees must hold even when the system is subject to attacks or natural failures.

NonStop is a series of server computers introduced to market in 1976 by Tandem Computers Inc., beginning with the NonStop product line. It was followed by the Tandem Integrity NonStop line of lock-step fault tolerant computers, now defunct. The original NonStop product line is currently offered by Hewlett Packard Enterprise since Hewlett-Packard Company's split in 2015. Because NonStop systems are based on an integrated hardware/software stack, Tandem and later HPE also developed the NonStop OS operating system for them.

Fault tolerance is the ability of a system to maintain proper operation in the event of failures or faults in one or more of its components. If its operating quality decreases at all, the decrease is proportional to the severity of the failure, as compared to a naively designed system, in which even a small failure can lead to total breakdown. Fault tolerance is particularly sought after in high-availability, mission-critical, or even life-critical systems. The ability of maintaining functionality when portions of a system break down is referred to as graceful degradation.

A fundamental problem in distributed computing and multi-agent systems is to achieve overall system reliability in the presence of a number of faulty processes. This often requires coordinating processes to reach consensus, or agree on some data value that is needed during computation. Example applications of consensus include agreeing on what transactions to commit to a database in which order, state machine replication, and atomic broadcasts. Real-world applications often requiring consensus include cloud computing, clock synchronization, PageRank, opinion formation, smart power grids, state estimation, control of UAVs, load balancing, blockchain, and others.

In computer science, state machine replication (SMR) or state machine approach is a general method for implementing a fault-tolerant service by replicating servers and coordinating client interactions with server replicas. The approach also provides a framework for understanding and designing replication management protocols.

Paxos is a family of protocols for solving consensus in a network of unreliable or fallible processors. Consensus is the process of agreeing on one result among a group of participants. This problem becomes difficult when the participants or their communications may experience failures.

In fault-tolerant distributed computing, an atomic broadcast or total order broadcast is a broadcast where all correct processes in a system of multiple processes receive the same set of messages in the same order; that is, the same sequence of messages. The broadcast is termed "atomic" because it either eventually completes correctly at all participants, or all participants abort without side effects. Atomic broadcasts are an important distributed computing primitive.

Keith Marzullo is the inventor of Marzullo's algorithm, which is part of the basis of the Network Time Protocol and the Windows Time Service. On August 1, 2016 he became the Dean of the University of Maryland College of Information Studies after serving as the Director of the NITRD National Coordination Office. Prior to this he was a Professor in the Department of Computer Science and Engineering at University of California, San Diego. In 2011 he was inducted as a Fellow of the Association for Computing Machinery.

Byzantine fault tolerant protocols are algorithms that are robust to arbitrary types of failures in distributed algorithms. The Byzantine agreement protocol is an essential part of this task. The constant-time quantum version of the Byzantine protocol, is described below.

The Brooks–Iyengar algorithm or FuseCPA Algorithm or Brooks–Iyengar hybrid algorithm is a distributed algorithm that improves both the precision and accuracy of the interval measurements taken by a distributed sensor network, even in the presence of faulty sensors. The sensor network does this by exchanging the measured value and accuracy value at every node with every other node, and computes the accuracy range and a measured value for the whole network from all of the values collected. Even if some of the data from some of the sensors is faulty, the sensor network will not malfunction. The algorithm is fault-tolerant and distributed. It could also be used as a sensor fusion method. The precision and accuracy bound of this algorithm have been proved in 2016.

In a distributed computing system, a failure detector is a computer application or a subsystem that is responsible for the detection of node failures or crashes. Failure detectors were first introduced in 1996 by Chandra and Toueg in their book Unreliable Failure Detectors for Reliable Distributed Systems. The book depicts the failure detector as a tool to improve consensus and atomic broadcast in the distributed system. In other words, failure detectors seek errors in the process, and the system will maintain a level of reliability. In practice, after failure detectors spot crashes, the system will ban the processes that are making mistakes to prevent any further serious crashes or errors.

Baruch Awerbuch is an Israeli-American computer scientist and a professor of computer science at Johns Hopkins University. He is known for his research on distributed computing.

Daniel (Danny) Dolev is an Israeli computer scientist known for his research in cryptography and distributed computing. He holds the Berthold Badler Chair in Computer Science at the Hebrew University of Jerusalem and is a member of the scientific council of the European Research Council.

Proof-of-stake (PoS) protocols are a class of consensus mechanisms for blockchains that work by selecting validators in proportion to their quantity of holdings in the associated cryptocurrency. This is done to avoid the computational cost of proof-of-work (POW) schemes. The first functioning use of PoS for cryptocurrency was Peercoin in 2012, although the scheme, on the surface, still resembled a POW.

Michel Raynal is a French informatics scientist, professor at IRISA, University of Rennes, France. He is known for his contributions in the fields of algorithms, computability, and fault-tolerance in the context of concurrent and distributed systems. Michel Raynal is also Distinguished Chair professor at the Hong Kong Polytechnic University and editor of the “Synthesis Lectures on Distributed Computing Theory” published by Morgan & Claypool. He is a senior member of Institut Universitaire de France and a member of Academia Europaea.

<span class="mw-page-title-main">Robert Shostak</span> American computer scientist

Robert Eliot Shostak is an American computer scientist and Silicon Valley entrepreneur. He is most noted academically for his seminal work in the branch of distributed computing known as Byzantine Fault Tolerance. He is also known for co-authoring the Paradox Database, and most recently, the founding of Vocera Communications, a company that makes wearable, Star Trek-like communication badges.

<span class="mw-page-title-main">Avalanche (blockchain platform)</span> Open-source blockchain computing platform

Avalanche is a decentralized, open-source proof of stake blockchain with smart contract functionality. AVAX is the native cryptocurrency of the platform.

<span class="mw-page-title-main">Ouroboros (protocol)</span> Blockchain protocol

Ouroboros is a family of proof-of-stake consensus protocols used in the Cardano and Polkadot blockchains. It can run both permissionless and permissioned blockchains.

References

  1. Kirrmann, Hubert (n.d.). "Fault Tolerant Computing in Industrial Automation" (PDF). Switzerland: ABB Research Center. p. 94. Archived from the original (PDF) on 2014-03-26. Retrieved 2015-03-02.
  2. Lamport, L.; Shostak, R.; Pease, M. (1982). "The Byzantine Generals Problem" (PDF). ACM Transactions on Programming Languages and Systems. 4 (3): 382–401. CiteSeerX   10.1.1.64.2312 . doi:10.1145/357172.357176. S2CID   55899582. Archived (PDF) from the original on 13 June 2018.
  3. 1 2 3 4 Driscoll, K.; Hall, B.; Paulitsch, M.; Zumsteg, P.; Sivencrona, H. (2004). "The Real Byzantine Generals". The 23rd Digital Avionics Systems Conference (IEEE Cat. No.04CH37576). pp. 6.D.4–61–11. doi:10.1109/DASC.2004.1390734. ISBN   978-0-7803-8539-9. S2CID   15549497.
  4. 1 2 3 Driscoll, Kevin; Hall, Brendan; Sivencrona, Håkan; Zumsteg, Phil (2003). "Byzantine Fault Tolerance, from Theory to Reality". Computer Safety, Reliability, and Security. Lecture Notes in Computer Science. Vol. 2788. pp. 235–248. doi:10.1007/978-3-540-39878-3_19. ISBN   978-3-540-20126-7. ISSN   0302-9743. S2CID   12690337.
  5. 1 2 3 Lamport, L.; Shostak, R.; Pease, M. (1982). "The Byzantine Generals Problem" (PDF). ACM Transactions on Programming Languages and Systems. 4 (3): 387–389. CiteSeerX   10.1.1.64.2312 . doi:10.1145/357172.357176. S2CID   55899582. Archived from the original (PDF) on 7 February 2017.
  6. Matthias Fitzi (2002). "Generalized Communication and Security Models in Byzantine Agreement" (PDF). ETH Zurich.
  7. 1 2 "SIFT: design and analysis of a fault-tolerant computer for aircraft control". Microelectronics Reliability. 19 (3): 190. 1979. doi:10.1016/0026-2714(79)90211-7. ISSN   0026-2714.
  8. Pease, Marshall; Shostak, Robert; Lamport, Leslie (April 1980). "Reaching Agreement in the Presence of Faults". Journal of the Association for Computing Machinery. 27 (2): 228–234. CiteSeerX   10.1.1.68.4044 . doi:10.1145/322186.322188. S2CID   6429068.
  9. Lamport, Leslie (2016-12-19). "The Byzantine Generals Problem". ACM Transactions on Programming Languages and Systems. SRI International. Retrieved 18 March 2019.
  10. Avizienis, A.; Laprie, J.-C.; Randell, Brian; Landwehr, C. (2004). "Basic concepts and taxonomy of dependable and secure computing". IEEE Transactions on Dependable and Secure Computing. 1 (1): 11–33. doi:10.1109/TDSC.2004.2. hdl: 1903/6459 . ISSN   1545-5971. S2CID   215753451.
  11. "Dependable Computing and Fault Tolerance". Archived from the original on 2015-04-02. Retrieved 2015-03-02.
  12. Feldman, P.; Micali, S. (1997). "An optimal probabilistic protocol for synchronous Byzantine agreement" (PDF). SIAM J. Comput. 26 (4): 873–933. doi:10.1137/s0097539790187084. Archived (PDF) from the original on 2016-03-05. Retrieved 2012-06-14.
  13. Paulitsch, M.; Morris, J.; Hall, B.; Driscoll, K.; Latronico, E.; Koopman, P. (2005). "Coverage and the Use of Cyclic Redundancy Codes in Ultra-Dependable Systems". 2005 International Conference on Dependable Systems and Networks (DSN'05). pp. 346–355. doi:10.1109/DSN.2005.31. ISBN   978-0-7695-2282-1. S2CID   14096385.
  14. Hopkins, Albert L.; Lala, Jaynarayan H.; Smith, T. Basil (1987). "The Evolution of Fault Tolerant Computing at the Charles Stark Draper Laboratory, 1955–85". The Evolution of Fault-Tolerant Computing. Dependable Computing and Fault-Tolerant Systems. Vol. 1. pp. 121–140. doi:10.1007/978-3-7091-8871-2_6. ISBN   978-3-7091-8873-6. ISSN   0932-5581.
  15. Driscoll, Kevin; Papadopoulos, Gregory; Nelson, Scott; Hartmann, Gary; Ramohalli, Gautham (1984), Multi-Microprocessor Flight Control System (Technical Report), Wright-Patterson Air Force Base, OH: AFWAL/FIGL U.S. Air Force Systems Command, AFWAL-TR-84-3076
  16. Castro, M.; Liskov, B. (2002). "Practical Byzantine Fault Tolerance and Proactive Recovery". ACM Transactions on Computer Systems. Association for Computing Machinery. 20 (4): 398–461. CiteSeerX   10.1.1.127.6130 . doi:10.1145/571637.571640. S2CID   18793794.
  17. Abd-El-Malek, M.; Ganger, G.; Goodson, G.; Reiter, M.; Wylie, J. (2005). "Fault-scalable Byzantine Fault-Tolerant Services". ACM SIGOPS Operating Systems Review. Association for Computing Machinery. 39 (5): 59. doi:10.1145/1095809.1095817.
  18. Cowling, James; Myers, Daniel; Liskov, Barbara; Rodrigues, Rodrigo; Shrira, Liuba (2006). HQ Replication: A Hybrid Quorum Protocol for Byzantine Fault Tolerance. Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation. pp. 177–190. ISBN   1-931971-47-1.
  19. Kotla, Ramakrishna; Alvisi, Lorenzo; Dahlin, Mike; Clement, Allen; Wong, Edmund (December 2009). "Zyzzyva: Speculative Byzantine Fault Tolerance". ACM Transactions on Computer Systems. Association for Computing Machinery. 27 (4): 1–39. doi:10.1145/1658357.1658358.
  20. Guerraoui, Rachid; Kneževic, Nikola; Vukolic, Marko; Quéma, Vivien (2010). The Next 700 BFT Protocols. Proceedings of the 5th European conference on Computer systems. EuroSys. Archived from the original on 2011-10-02. Retrieved 2011-10-04.
  21. Clement, A.; Wong, E.; Alvisi, L.; Dahlin, M.; Marchetti, M. (April 22–24, 2009). Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults (PDF). Symposium on Networked Systems Design and Implementation. USENIX. Archived (PDF) from the original on 2010-12-25. Retrieved 2010-02-17.
  22. Aublin, P.-L.; Ben Mokhtar, S.; Quéma, V. (July 8–11, 2013). RBFT: Redundant Byzantine Fault Tolerance. 33rd IEEE International Conference on Distributed Computing Systems. International Conference on Distributed Computing Systems. Archived from the original on August 5, 2013.
  23. Bahsoun, J. P.; Guerraoui, R.; Shoker, A. (2015-05-01). "Making BFT Protocols Really Adaptive". 2015 IEEE International Parallel and Distributed Processing Symposium. pp. 904–913. doi:10.1109/IPDPS.2015.21. ISBN   978-1-4799-8649-1. S2CID   16310807.
  24. Chun, Byung-Gon; Maniatis, Petros; Shenker, Scott; Kubiatowicz, John (2007-01-01). "Attested append-only memory". Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles. SOSP '07. New York, NY, USA: ACM. pp. 189–204. doi:10.1145/1294261.1294280. ISBN   9781595935915. S2CID   6685352.
  25. Veronese, G. S.; Correia, M.; Bessani, A. N.; Lung, L. C.; Verissimo, P. (2013-01-01). "Efficient Byzantine Fault-Tolerance". IEEE Transactions on Computers. 62 (1): 16–30. CiteSeerX   10.1.1.408.9972 . doi:10.1109/TC.2011.221. ISSN   0018-9340. S2CID   8157723.
  26. Driscoll, Kevin (2012-12-11). "Real System Failures". DASHlink. NASA. Archived from the original on 2015-04-02. Retrieved 2015-03-02.
  27. Nanya, T.; Goosen, H.A. (1989). "The Byzantine hardware fault model". IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 8 (11): 1226–1231. doi:10.1109/43.41508. ISSN   0278-0070.
  28. Martins, Rolando; Gandhi, Rajeev; Narasimhan, Priya; Pertet, Soila; Casimiro, António; Kreutz, Diego; Veríssimo, Paulo (2013). "Experiences with Fault-Injection in a Byzantine Fault-Tolerant Protocol". Middleware 2013. Lecture Notes in Computer Science. Vol. 8275. pp. 41–61. doi:10.1007/978-3-642-45065-5_3. ISBN   978-3-642-45064-8. ISSN   0302-9743. S2CID   31337539.
  29. USpatent 7475318,Kevin R. Driscoll,"Method for testing the sensitive input range of Byzantine filters",issued 2009-01-06, assigned to Honeywell International Inc.
  30. Walter, C.; Ellis, P.; LaValley, B. (2005). "The Reliable Platform Service: A Property-Based Fault Tolerant Service Architecture". Ninth IEEE International Symposium on High-Assurance Systems Engineering (HASE'05). pp. 34–43. doi:10.1109/HASE.2005.23. ISBN   978-0-7695-2377-4. S2CID   21468069.
  31. Rubby, Matt (20 January 2024). "How Byzantine Generals Problem Relates to You in 2024". Swan Bitcoin. Retrieved 2024-01-27.
  32. Tholoniat, Pierre; Gramoli, Vincent (2022), Tran, Duc A.; Thai, My T.; Krishnamachari, Bhaskar (eds.), "Formal Verification of Blockchain Byzantine Fault Tolerance", Handbook on Blockchain, Springer Optimization and Its Applications, Cham: Springer International Publishing, pp. 389–412, arXiv: 1909.07453 , doi:10.1007/978-3-031-07535-3_12, ISBN   978-3-031-07535-3 , retrieved 2024-01-27
  33. Deirmentzoglou, Papakyriakopoulos & Patsakis 2019, p. 28716.
  34. M., Paulitsch; Driscoll, K. (9 January 2015). "Chapter 48:SAFEbus". In Zurawski, Richard (ed.). Industrial Communication Technology Handbook, Second Edition. CRC Press. pp. 48–1–48–26. ISBN   978-1-4822-0733-0.
  35. Thomas A. Henzinger; Christoph M. Kirsch (26 September 2001). Embedded Software: First International Workshop, EMSOFT 2001, Tahoe City, CA, USA, October 8-10, 2001. Proceedings (PDF). Springer Science & Business Media. pp. 307–. ISBN   978-3-540-42673-8. Archived (PDF) from the original on 2015-09-22. Retrieved 2015-03-05.
  36. Yeh, Y.C. (2001). "Safety critical avionics for the 777 primary flight controls system". 20th DASC. 20th Digital Avionics Systems Conference (Cat. No.01CH37219). Vol. 1. pp. 1C2/1–1C2/11. doi:10.1109/DASC.2001.963311. ISBN   978-0-7803-7034-0. S2CID   61489128.
  37. "ELC: SpaceX lessons learned [LWN.net]". Archived from the original on 2016-08-05. Retrieved 2016-07-21.

Sources