How effective is Byzantine Fault Tolerance

How effective is Byzantine Fault Tolerance

Anyone who exchanges real values for banknotes or coins expects that their means of payment will be protected from counterfeiting, and will be unique. Otherwise, anyone can make any number of copies and, thus, quickly spoil the value of the currency.

In the case of currencies issued by the state, watermarks, security stripes and holograms guarantee protection against counterfeiting, and the central bank controls the amount of money in circulation. As for precious metals, chemistry creates clarity. No one has ever managed to “imitate” gold and silver.

Despite the huge amount of money and precious metals in circulation for centuries, there is no evidence of forgeries that could not be detected in principle. It just all depends on effort and attention.

In the case of virtual money, only cryptography and the crypto community guarantee that Bitcoin and Ethereum will not be spent twice (“double spending”). Cryptocurrency developers assume that false confirmation of the validity of a Bitcoin block is impossible, because only a small group can participate in collusion, but never the majority of participants. Today, according to Coinmarketrate.com there are almost 13 thousand cryptocurrencies, and their developers think differently.

Suitable incentive systems should always provide a sufficiently large number of participants. So law-abiding users always have a majority. However, this is also absolutely necessary, otherwise the mechanism for preventing such attacks will fail. The data on blockchains must match both in content and in order. If there are disagreements, the task of the consensus mechanism is to restore order. However, the consensus mechanism cannot independently distinguish between true and false entries in block chains. It only helps to overcome the majority opinion, but it can also be wrong.

Byzantine mistake or betrayal of officers

In 1453 AD, Ottoman generals besieged Constantinople, also known as Ostrom. Today the city is called Istanbul. In the Middle Ages, it was the center of the Byzantine Empire. About a thousand years after the fall of Western Rome in 1453 AD, the fortress wall still protected Constantinople, and posed a problem for attackers. The attackers could only communicate with each other through messengers, because the wall was too long. To succeed, they had to attack the wall at the same time, from different sides. The only alternative was a prolonged siege.

However, some Turkish generals took a trick and distorted the commander-in-chief’s decision to “attack”. Their goal was to misinform the spies in the Sultan’s entourage, trying to force the enemy to attack prematurely, with the help of a cleverly spread trap. What is the reason for this among the generals loyal to the Sultan? They don’t know who is a traitor and who is not.

Now the task is to develop the most fault-tolerant decision-making process. Generals exchange information with each other through messengers. It should be noted that messengers could disappear without a trace, and messages can be forged.

Historically significant event of the XIV century has so impressed mathematicians so far that they call a whole class of problems “Byzantine errors” or “high treason”. Computer scientists have ready-made solutions, but they only work if there are not too many traitors. The limit is a third.

However, this assumes that the messages are genuine, that is, protected from forgery. Today, this task is taken over by public key encryption systems. But the general can’t wait forever to make a decision. Missing messages should be recognized as such by timeout. Voting is carried out according to the majority principle. This means that messages arriving late may affect the outcome of the vote, in particular, the majority opinion may also lead to an incorrect result. A fact that is often forgotten.

In practice, DDoS attacks can take on a task that was previously assigned to soldiers: preventing the delivery of messages.

Many nodes, many messages

Systems require hours of synchronous operation to be able to coordinate joint actions. If an external source is not available, a faulty clock can, in turn, cause a Byzantine error during synchronization. It should also be borne in mind that if there are only 5 “traitors”, the number of messages sent will increase to more than 3.6 million. This is how Leslie Lamport and Danny Dolev calculated it in 2005 during a seminar at the Hasso Plattner Institute.

When choosing a consensus mechanism, it is necessary to ensure that there is no excessive dependence on the number of nodes. BSI experts warn that if consensus mechanisms do not scale well for a larger number of nodes, then their use in blockchains that do not require approval, since it does not make sense.

Reaching consensus

The analogy between Byzantine mistakes or high treason and blockchain fraud is obvious. In the end, every general has to decide whether to attack or wait, there is no golden mean. Everyone is trying to coordinate their actions with as many generals as possible in order to achieve clarity. This is how consensus mechanisms work.

However, mathematicians are not talking about generals, but about the nodes of the network. Byzantine errors can be caused by errors in work, technical problems, betrayal or intent to deceive. In all cases, the nodes are no longer consistent. Therefore, there is no clarity regarding the true content of the blockchain, or the status of the registry (journal).

Synchronous and asynchronous networks

Currently, a number of methods have been proposed. Not all are suitable for all use cases. The requirements for consensus mechanisms for public and private blockchains differ in many respects. When choosing a procedure, it is also necessary to clarify what is the priority in the process of reaching consensus – security or survivability. Everyone who develops a consensus mechanism must decide which of the two requirements is more important.

Cheerfulness ensures that every correct node will eventually make a decision. So there shouldn’t be an infinite delay even if messages are still pending and the procedure is terminated.

In asynchronous networks, it may happen that one of the nodes (generals) is waiting for the last messenger, but he never arrives. By definition, there is a time limit in synchronous networks. In addition, consensus mechanisms require security. To do this, three conditions must be met:

  1. A node can only decide on a value that has been suggested by another node
  2. Honesty. No node can make multiple decisions at once.
  3. Approval. Two correct nodes should not make different decisions.

In addition to the Byzantine mistakes, there are also departures. In this case, the nodes fail. However, after a while they usually work again.

The art of mathematicians now is to find methods that can mitigate a large number of fraud attempts or failures as quickly as possible. There are two types: CFT algorithms (fault-tolerant) eliminate failure errors, and BFT algorithms (fault-tolerant Byzantine), which should eradicate treason (Byzantine failures).

In practice, CFT algorithms are almost always used in distributed systems due to their higher throughput. The most famous example of using new processes is the Bitcoin cryptocurrency, which, like Ethereum, uses Proof-of-Work (POW) algorithms from the Proof-of-X (pox) class of processes.

With these algorithms, consensus on new data is achieved using a mathematical problem that requires large computational resources (searching for a string with a hash value below a given limit), which must be solved for each new block. The solution to this riddle is a real “proof of work”. If you are the first to find the right solution, your block will be distributed in the network and accepted by other nodes.

Forks and hard forks

The delay time during the transmission of messages on the network may mean that different nodes initially store different blocks in their local copy of the block chain, in the same place. This creates different branches of the block chain and hence inconsistencies. This is called a fork. However, such forks are resolved after a short time.

So, the most likely solution has been determined. But, thus, there is no mathematical guarantee that the correct fork branch will always be included in the block chain.

In abstract terms, PoW can be understood as a consensus mechanism with a leader whose proposal is usually accepted by all the correct nodes. The discussion of the best consensus procedure is not over yet. Now there are more energy-saving processes, but they have not yet been implemented in practice.

The safety of PoW is not guaranteed permanently, because the added blocks may subsequently become invalid due to the dissolution of the forks. This distinguishes them from the classic CFT and BFT processes, which always guarantee safety, and whose blocks are final.

PoW processes are extremely resource-intensive and therefore require a lot of energy. In order to guarantee a sufficient number of participants, economic incentive systems are needed. Usually, a certain amount in the cryptocurrency in question is credited to a successful node.

At the same time, this means that a high exchange rate is absolutely necessary to prevent excessive centralization of the mining process. Ultimately, a broad user base is the only guarantee against fraud. Even today, participation in certain blockchain currencies pays off almost exclusively due to optimized equipment in places with low electricity tariffs.

In 2003, Swiss physicists developed a method for a more elegant and much better solution to the Byzantine error problem using quantum mechanics. To do this, they use the physical properties of entangled quantum states. However, today these methods are not used in cryptocurrencies and related processes.