User:Path-logical/sandbox

In fault-tolerant computer systems, Byzantine fault tolerance is the characteristic of a system that tolerates the class of failures known as the Byzantine Generals' Problem^[1], which is a generalized version of the [[Two Generals' Problem]]. The phrases interactive consistency or source congruency have been used to refer to Byzantine fault tolerance, particularly among the members of the early implementer teams.^[2]

The objective of Byzantine fault tolerance is to be able to defend against Byzantine failures, in which components of a system fail with symptoms that prevent some components of the system from reaching agreement among themselves, where such agreement is needed for the correct operation of the system. Correctly functioning components of a Byzantine fault tolerant system will be able to provide the system's service assuming there are not too many faulty components.

The following practical, concise definitions are helpful in understanding Byzantine fault tolerance:^[3]

Byzantine fault: Any fault presenting different symptoms to different observers
Byzantine failure: The loss of a system service due to a Byzantine fault in systems that require

consensus where fault and failure use the standard definitions from the IFIP WG10.4 on Dependable Systems ^[4] ^[5]. Note that the type of system services which Byzantine faults affect are agreement (a.k.a consensus) services.

Origin

Byzantine refers to the Byzantine Generals' Problem, an agreement problem (described by Marshall Pease, Robert Shostak, and Leslie Lamport in their 1980 paper, [http://research.microsoft.com/en-us/um/people/lamport/pubs/reaching.pdf "Reaching Agreement in the Presence of Faults"])^[6] in which a group of generals, each commanding a division of the <a href="/wiki/Byzantine_army" title="Byzantine army">Byzantine army</a>, encircling a city. These <a href="/wiki/General" title="General" class="mw-redirect">generals</a> wish to formulate a plan for attacking the city. In its simplest form, the generals must only decide whether to attack or retreat. Some generals may prefer to attack, while others prefer to retreat. The important thing is that every general agrees on a common decision, for a halfhearted attack by a few generals would become a <a href="/wiki/Rout" title="Rout">rout</a> and be worse than a coordinated attack or a coordinated retreat.

The problem is complicated by the presence of traitorous generals who may not only cast a vote for a suboptimal strategy, they may do so selectively. For instance, if nine generals are voting, four of whom support attacking while four others are in favor of retreat, the ninth general may send a vote of retreat to a few generals, and a vote of attack to the rest. Those who received a retreat vote from the ninth general will retreat, while the rest will attack (which may not go well for the attackers).

Byzantine fault tolerance can be achieved if the loyal (non-faulty) generals have a unanimous agreement on their strategy. Note that if the source general is correct, all loyal generals must agree upon that value; otherwise, the choice of strategy used is irrelevant.

http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#byz

Known examples of Byzantine failures

Early solutions

Several solutions were described by Lamport, Shostak, and Pease in 1982.^[1] They began by noting that the Generals' Problem can be reduced to solving a "Commander and Lieutenants" problem where Loyal Lieutenants must all act in unison and that their action must correspond to what the Commander ordered in the case that the Commander is Loyal. Roughly speaking, the Generals vote by treating each other's orders as votes.

One solution considers scenarios in which messages may be forged, but which

will be Byzantine-fault-tolerant as long as the number of traitorous generals does not equal or exceed one third. The impossibility of dealing with one-third or more traitors ultimately reduces to proving that the 1 Commander/2 Lieutenants problem cannot be solved, if the Commander is traitorous. To see this, suppose we have a traitorous Commander A, and two Lieutenants, B and C: when A tells B to attack and C to retreat, and B and C send messages to each other, forwarding A's message, neither B nor C can figure out who is the traitor, since it is not necessarily A—the other commander could have forged the message purportedly from A. It can be shown that if n is the number of generals in total, and t is the number of traitors in that n, then there are solutions to the problem only when n > 3t.^[7]

A second solution requires unforgeable message signatures. For

security-critical systems, digital signatures (in modern computer systems, this may be achieved in practice using public-key cryptography) can provide Byzantine fault tolerance in the presence of an arbitrary number of traitorous generals. However, for safety-critical systems, simple error detecting codes, such as CRCs, provide the same or better coverage at a much lower cost. Thus, cryptographic digital signature methods are not a good choice for safety-critical systems, unless there is also a specific security threat as well. ^[8]

Also presented is a variation on the first two solutions allowing

Byzantine-fault-tolerant behavior in some situations where not all generals can communicate directly with each other.

Practical Byzantine fault tolerance

In 1999, Miguel Castro and Barbara Liskov introduced the "Practical Byzantine Fault Tolerance" (PBFT) algorithm,^[9] which provides high-performance Byzantine state machine replication, processing thousands of requests per second with sub-millisecond increases in latency.

PBFT triggered a renaissance in BFT replication research^{[citation needed]}, with protocols like Q/U,^[10] HQ,^[11] Zyzzyva,^[12] and ABsTRACTs ^[13] working to lower costs and improve performance and protocols like Aardvark^[14] and RBFT^[15] working to improve robustness.

UpRight^[16] is an open source library for constructing services that tolerate both crashes ("up") and Byzantine behaviors ("right") that incorporates many of these protocols' innovations.

One example of BFT in use is Bitcoin, a peer-to-peer digital currency system. The Bitcoin network works in parallel to generate a chain of Hashcash style proof-of-work. The proof-of-work chain is the key to overcome Byzantine failures and to reach a coherent global view of the system state.^[17] Hyperledger utilizes PBFT to achieve similar results to Bitcoin but with known entities operating the consensus nodes.^[18]

Additionally to PBFT and Upright, there is also the BFT-SMaRt library,^[19] a high-performance Byzantine fault-tolerant state machine replication library developed in Java. This library implements a protocol very similar to PBFT's, plus complementary protocols which offer state transfer and on-the-fly reconfiguration of hosts. BFT-SMaRt is the most recent effort to implement state machine replication, still being actively maintained.

Archistar^[20] utilizes a slim BFT layer^[21] for communication. It prototypes a secure multi-cloud storage system using Java licensed under LGPLv2. Focus lies on simplicity and readability, it aims to be the foundation for further research projects.

Tolerance Degree and Resources

Such algorithms are commonly characterized by their resilience t, the number of faulty processes with which an algorithm can cope.

Many classic agreement problems, such as the Byzantine Generals' Problem, have no solution unless n = 3t + 1, where n is the number of processes in the system and t is the number of traitors (faults).^[1] In other words, the algorithm can ensure correct operation only if fewer than one third of the processes are faulty.

References

^ ^a ^b ^c {{cite doi|10.1145/357172.357176}}
^ Kirrmann, Hubert (n.d.). [[1] "Fault Tolerant Computing in Industrial Automation"] (PDF). CH-5405 Baden, Switzerland: ABB Research Center. p. 94. Retrieved 2015-03-02. {{cite web}}: Check |url= value (help)CS1 maint: location (link)
^ {{cite journal}}: Empty citation (help)
^ Driscoll, K.; Hall, B.; Paulitsch, M.; Zumsteg, P.; Sivencrona, H. (2004). "The real Byzantine Generals": 6.D.4–61-11. doi:10.1109/DASC.2004.1390734. {{cite journal}}: Cite journal requires |journal= (help)
^ Driscoll, Kevin; Hall, Brendan; Sivencrona, Håkan; Zumsteg, Phil (2003). "Byzantine Fault Tolerance, from Theory to Reality". 2788: 235–248. doi:10.1007/978-3-540-39878-3_19. ISSN 0302-9743. {{cite journal}}: Cite journal requires |journal= (help)
^ {{cite doi|10.1145/322186.322188}}
^ Feldman, P.; Micali, S. (1997). "An optimal probabilistic protocol for synchronous Byzantine agreement" (PDF). SIAM J. Computing. 26 (4): 873–933. doi:10.1137/s0097539790187084.
^ Paulitsch, M.; Morris, J.; Hall, B.; Driscoll, K.; Latronico, E.; Koopman, P. (2005). "Coverage and the Use of Cyclic Redundancy Codes in Ultra-Dependable Systems": 346–355. doi:10.1109/DSN.2005.31. {{cite journal}}: Cite journal requires |journal= (help)
^ Castro, M.; Liskov, B. (2002). "Practical Byzantine Fault Tolerance and Proactive Recovery". ACM Transactions on Computer Systems. 20 (4). Association for Computing Machinery: 398–461. doi:10.1145/571637.571640. CiteSeer^x: 10.1.1.127.6130.
^ Abd-El-Malek; Ganger, G.; Goodson, G.; Reiter, M.; Wylie, J. (2005). "Fault-scalable Byzantine Fault-Tolerant Services". Association for Computing Machinery. doi:10.1145/1095809.1095817. {{cite journal}}: Cite journal requires |journal= (help); Unknown parameter |conference= ignored (help)
^ Cowling, James; Myers, Daniel; Liskov, Barbara; Rodrigues, Rodrigo; Shrira, Liuba (2006). HQ Replication: A Hybrid Quorum Protocol for Byzantine Fault Tolerance. Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation. pp. 177–190. ISBN 1-931971-47-1.
^ Kotla, Ramakrishna; Alvisi, Lorenzo; Dahlin, Mike; Clement, Allen; Wong, Edmund (December 2009). "Zyzzyva: Speculative Byzantine Fault Tolerance". ACM Transactions on Computer Systems. 27 (4). Association for Computing Machinery. doi:10.1145/1658357.1658358.
^ Guerraoui, Rachid; Kneževic, Nikola; Vukolic, Marko; Quéma, Vivien (2010). The Next 700 BFT Protocols. Proceedings of the 5th European conference on Computer systems. EuroSys. {{cite conference}}: External link in |publisher= (help)
^ {{cite conference | first1 = A. | last1 = Clement | first2 = E. | last2 = Wong | first3 = L. | last3 = Alvisi | first4 = M. | last4 = Dahlin | first5 = M. | last5 = Marchetti | url = http://www.usenix.org/events/nsdi09/tech/full_papers/clement/clement.pdf | title = Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults | publisher = USENIX | conference = Symposium on Networked Systems Design and Implementation | date = April 22–24, 2009 }}
^ Aublin, P.-L.; Ben Mokhtar, S.; Quéma, V. (July 8–11, 2013). RBFT: Redundant Byzantine Fault Tolerance. 33rd IEEE International Conference on Distributed Computing Systems. International Conference on Distributed Computing Systems.
^ UpRight. Google Code repository for the UpRight replication library.
^ bitcointalk.org The Byzantine Generals' Problem
^ hyperledger.com Hyperledger
^ BFT-SMaRt. Google Code repository for the BFT-SMaRt replication library.
^ [https://github.com/Archistar/archistar-core Archistar]. github repository for the Archistar project.
^ [https://github.com/Archistar/archistar-bft Archistar-bft BFT state-machine]. github repository for the Archistar project.

External links

Ocean Store replicates data with a Byzantine fault tolerant commit protocol.
Practical Byzantine Fault Tolerance
Byzantine Fault Tolerance in the RKBExplorer
UpRight is an open source library for Crash-tolerant and Byzantine-tolerant state machine replication.

[BGP_Paper-1] {{cite doi|10.1145/357172.357176}}

[2] Kirrmann, Hubert (n.d.). [[1] "Fault Tolerant Computing in Industrial Automation"] (PDF). CH-5405 Baden, Switzerland: ABB Research Center. p. 94. Retrieved 2015-03-02. {{cite web}}: Check |url= value (help)CS1 maint: location (link)

[3] {{cite journal}}: Empty citation (help)

[DriscollHall2004-4] Driscoll, K.; Hall, B.; Paulitsch, M.; Zumsteg, P.; Sivencrona, H. (2004). "The real Byzantine Generals": 6.D.4–61-11. doi:10.1109/DASC.2004.1390734. {{cite journal}}: Cite journal requires |journal= (help)

[DriscollHall2003-5] Driscoll, Kevin; Hall, Brendan; Sivencrona, Håkan; Zumsteg, Phil (2003). "Byzantine Fault Tolerance, from Theory to Reality". 2788: 235–248. doi:10.1007/978-3-540-39878-3_19. ISSN 0302-9743. {{cite journal}}: Cite journal requires |journal= (help)

[6] {{cite doi|10.1145/322186.322188}}

[7] Feldman, P.; Micali, S. (1997). "An optimal probabilistic protocol for synchronous Byzantine agreement" (PDF). SIAM J. Computing. 26 (4): 873–933. doi:10.1137/s0097539790187084.

[PaulitschMorris2005-8] Paulitsch, M.; Morris, J.; Hall, B.; Driscoll, K.; Latronico, E.; Koopman, P. (2005). "Coverage and the Use of Cyclic Redundancy Codes in Ultra-Dependable Systems": 346–355. doi:10.1109/DSN.2005.31. {{cite journal}}: Cite journal requires |journal= (help)

[9] Castro, M.; Liskov, B. (2002). "Practical Byzantine Fault Tolerance and Proactive Recovery". ACM Transactions on Computer Systems. 20 (4). Association for Computing Machinery: 398–461. doi:10.1145/571637.571640. CiteSeer^x: 10.1.1.127.6130.

[10] Abd-El-Malek; Ganger, G.; Goodson, G.; Reiter, M.; Wylie, J. (2005). "Fault-scalable Byzantine Fault-Tolerant Services". Association for Computing Machinery. doi:10.1145/1095809.1095817. {{cite journal}}: Cite journal requires |journal= (help); Unknown parameter |conference= ignored (help)

[11] Cowling, James; Myers, Daniel; Liskov, Barbara; Rodrigues, Rodrigo; Shrira, Liuba (2006). HQ Replication: A Hybrid Quorum Protocol for Byzantine Fault Tolerance. Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation. pp. 177–190. ISBN 1-931971-47-1.

[12] Kotla, Ramakrishna; Alvisi, Lorenzo; Dahlin, Mike; Clement, Allen; Wong, Edmund (December 2009). "Zyzzyva: Speculative Byzantine Fault Tolerance". ACM Transactions on Computer Systems. 27 (4). Association for Computing Machinery. doi:10.1145/1658357.1658358.

[13] Guerraoui, Rachid; Kneževic, Nikola; Vukolic, Marko; Quéma, Vivien (2010). The Next 700 BFT Protocols. Proceedings of the 5th European conference on Computer systems. EuroSys. {{cite conference}}: External link in |publisher= (help)

[14] {{cite conference | first1 = A. | last1 = Clement | first2 = E. | last2 = Wong | first3 = L. | last3 = Alvisi | first4 = M. | last4 = Dahlin | first5 = M. | last5 = Marchetti | url = http://www.usenix.org/events/nsdi09/tech/full_papers/clement/clement.pdf | title = Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults | publisher = USENIX | conference = Symposium on Networked Systems Design and Implementation | date = April 22–24, 2009 }}

[15] Aublin, P.-L.; Ben Mokhtar, S.; Quéma, V. (July 8–11, 2013). RBFT: Redundant Byzantine Fault Tolerance. 33rd IEEE International Conference on Distributed Computing Systems. International Conference on Distributed Computing Systems.

[16] UpRight. Google Code repository for the UpRight replication library.

[17] tcointalk.org The Byzantine Generals' Problem

[18] yperledger.com Hyperledger

[19] BFT-SMaRt. Google Code repository for the BFT-SMaRt replication library.

[20] [https://github.com/Archistar/archistar-core Archistar]. github repository for the Archistar project.

[21] [https://github.com/Archistar/archistar-bft Archistar-bft BFT state-machine]. github repository for the Archistar project.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]