Consensus (computer science) - Agreement Problems and Protocols
Understand the equivalence of agreement problems, the solvability limits for Byzantine and crash failures, and the core consensus protocols such as Paxos, Raft, Phase King, and MSR.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz
Quick Practice
What is the primary goal of Terminating Reliable Broadcast?
1 of 17
Summary
Agreement Problems and Their Equivalence
Introduction
Distributed systems often require processes to agree on a single value despite failures. This challenge—consensus or agreement—lies at the heart of many distributed algorithms. The study of agreement problems reveals deep theoretical results: some configurations make consensus impossible, while others make it feasible with specific protocols. Understanding the landscape of these problems, their relationships, and what makes them solvable is essential for building reliable distributed systems.
Core Agreement Problems
Terminating Reliable Broadcast
The simplest agreement problem is terminating reliable broadcast. In this problem, one process (the broadcaster) has a value it needs to communicate to all others. The requirements are:
Correct delivery: Every correct process receives a value
Consistency: If the broadcaster is correct, all correct processes receive the same value—specifically, the broadcaster's original value
Termination: Every correct process eventually delivers a value
This problem is foundational because it captures the essence of reliable communication: ensuring all parties receive identical information despite potential failures.
Consensus and Agreement
Consensus is the central agreement problem. A process starts with a private input value. The goal is for all processes to output the same final value through communication. The fundamental requirement is:
Agreement: All correct processes must output the same value
This simple-sounding requirement turns out to have profound limitations, depending on the system model and failure types.
Weak Interactive Consistency
Weak Interactive Consistency extends the agreement problem. Each process starts with a private input value and communicates over several rounds, producing a consensus vector—a collection of values, one attributed to each process.
The key properties are:
Integrity: If a correct process sends a value, every correct process either receives that exact value or nothing at all (a Byzantine process cannot forge values that differ between receivers)
Consistency: All messages sent by a correct process in a round reach all correct processes within that same round
Unlike simple consensus, interactive consistency aims for processes to agree on a vector of values where each component represents information from a specific process. This is useful when you need to distribute information from multiple sources reliably.
Equivalence of Agreement Problems
An important insight in distributed computing is that several agreement problems are equivalent—solving one problem means you can solve others.
From Weak Byzantine Generals to Interactive Consistency: If you have a protocol that solves the Weak Byzantine Generals problem in a synchronous authenticated model (where messages are digitally signed), you can use it to achieve Weak Interactive Consistency. Each process can use the Byzantine Generals protocol to reliably broadcast its own value to all others, effectively constructing the consensus vector.
From Interactive Consistency to Consensus: Conversely, an interactive consistency algorithm can solve consensus. The key insight is simple: once each process has a consistent vector of values from all other processes, it can apply a majority rule—adopt the most common value in its vector as its consensus output. Since all correct processes have the same vector (by consistency), they all compute the same majority value and reach agreement.
These equivalences mean that understanding one problem gives you tools to understand others. The choice of which problem to solve often depends on the specific system model and failure assumptions.
Solvability Results: When Can We Achieve Consensus?
Byzantine Failures in Synchronous Authenticated Systems
In a synchronous system where processes authenticate their messages (typically via digital signatures), the Byzantine Generals problem has a clean solvability condition:
A protocol tolerates $t$ Byzantine failures if and only if $n > 3t$, where $n$ is the total number of processes.
Why $3t$? The intuition is geometric: Byzantine processes can lie strategically. With $n = 3t + 1$ processes, even if $t$ are Byzantine, the honest processes form a $2t + 1$ majority. This majority is strong enough to outweigh any false information or inconsistent claims from the Byzantine processes. A Byzantine process cannot make an honest process believe it's in conflict with other honest processes because the honest processes always constitute the majority.
This bound is tight: if $n \leq 3t$, no protocol can guarantee consensus against $t$ Byzantine failures, even in a synchronous system with authentication.
Oral-Messages Model: A Stricter Setting
The oral-messages model is more restrictive: processes cannot authenticate their messages. This means a Byzantine process can claim to send any value to any other process, potentially sending different values to different neighbors.
In this model, the bound worsens: consensus is impossible if $n \leq 3f$, where $f$ is the number of Byzantine failures.
The key insight is that without signatures, Byzantine processes have more power to spread inconsistent information. A Byzantine process can tell process A one value, tell process B a different value, and claim each sent a third value to the other. This deception requires a stricter majority bound to overcome.
Written-Messages Model: More Powerful
Interestingly, the written-messages model (where each process writes values that others can read, but cannot be altered once written) is more powerful than the oral-messages model.
In written-messages: consensus is achievable with $f$ Byzantine failures when $n > f$.
The key advantage is immutability: once a value is written, all processes see the exact same value. This prevents Byzantine deceptions about "he said, she said." With $n > f$ processes, the honest majority ($n - f > f$) ensures agreement can be reached despite $f$ Byzantine processes writing false values.
Asynchronous Systems and the FLP Impossibility
The results above assume synchronous systems: all processes operate at roughly the same speed, and messages arrive within bounded time. But what about asynchronous systems, where there are no timing guarantees?
The FLP (Fischer, Lynch, Paterson) impossibility result is perhaps the most profound negative result in distributed computing: In an asynchronous system, consensus cannot be guaranteed even with a single crash failure, unless additional assumptions (like randomness) are introduced.
The intuition is subtle. In an asynchronous system, a slow process is indistinguishable from a crashed process—you never know if it's just taking a long time. An adversary can exploit this by strategically delaying the slowest process. Just as processes are about to decide on a value, the adversary can make it appear the slow process might have a different value, forcing the system into a state where no safe decision can be made. The adversary can repeat this indefinitely.
This impossibility is fundamental: no deterministic protocol can overcome it.
Overcoming Impossibility with Randomization
While the FLP impossibility applies to deterministic algorithms, randomized protocols can achieve consensus in asynchronous systems with crash failures.
The key idea: instead of requiring a protocol to always decide correctly, allow it to decide correctly with probability approaching 1. Randomized algorithms introduce unpredictability that an adversary cannot fully exploit asynchronously. Through repeated rounds with random choices, correct processes eventually converge to the same value with high probability, overcoming the FLP barrier.
These randomized solutions typically require more rounds than synchronous algorithms, but they prove that the impossibility barrier is not absolute—it only applies to the deterministic setting.
Representative Consensus Protocols
Paxos and Raft: Practical Leader-Based Protocols
Paxos and Raft are among the most important consensus protocols used in practice. Both share key characteristics:
Synchronous: They assume bounded message delays and timeouts to detect failures
Leader-based: One process (the leader) coordinates agreement
Crash-tolerant: They tolerate processes that simply fail and stop (crash failures), but not Byzantine failures where processes behave arbitrarily
Efficiency: Both tolerate $f$ crash failures with $n > f$ processes
Paxos and Raft dominate practical systems because they provide strong guarantees under reasonable assumptions (crash failures, mostly synchronous operation) with good performance. They are not designed for Byzantine failures, which require more rounds and message complexity.
<extrainfo>
The reason Paxos and Raft don't handle Byzantine failures is that detecting arbitrary behavior is expensive: you essentially need to triple-check everything, leading to the $n > 3f$ bound we saw earlier. For systems that don't expect Byzantine faults (like internal data center systems), this overhead is unjustifiable.
</extrainfo>
Phase King Algorithm: Byzantine Consensus Through Voting
The Phase King algorithm is a classical algorithm for achieving Byzantine consensus when you can tolerate $f$ Byzantine failures. It demonstrates how to handle Byzantine faults through iterative refinement of majority voting.
Structure: The algorithm runs for $f + 1$ phases. In each phase, there are two rounds:
First Round (Broadcast Round): Every process broadcasts its current preferred value to all others. Each process collects these values into a multiset.
Second Round (King Round): The process whose ID matches the current phase number acts as the "king." The king broadcasts the majority value it observed from the first round (breaking ties arbitrarily).
Update Rule: Each process updates its preferred value based on the king's broadcast:
If the process observed the king's value in the first round with frequency greater than $n/2 + f$, it adopts the king's value
Otherwise, it keeps its current preferred value (or uses a default)
Why $f + 1$ phases? Each phase eliminates potential disagreement among correct processes about what the "true" majority is. After $f + 1$ phases, even if the king was Byzantine in some phases, the correct processes' preferred values have converged to agreement. The $(f+1)$-th king, being correct with high confidence, broadcasts a value that all correct processes adopt.
Final Output: After all $f + 1$ phases complete, each process outputs its final preferred value. By the design of the update rule, all correct processes have the same preferred value and output the same consensus.
The beauty of Phase King is its simplicity: majority voting with strategic updates, repeated enough times to overcome Byzantine interference. The $n > 3f$ bound is necessary because the majority voting mechanism requires a significant honest supermajority.
<extrainfo>
MSR-Type Algorithms
Mean Subsequence Reduced (MSR) algorithms are another family of Byzantine-tolerant consensus protocols. These algorithms work by iteratively discarding outlier values—the smallest and largest values observed—and recomputing statistics on the remaining values.
The intuition is that Byzantine processes provide extreme values (very large or very small), while correct processes cluster around the true value. By repeatedly removing extremes, the algorithm converges to the correct value.
MSR algorithms are less commonly taught than Phase King because they require careful analysis of convergence properties and work best for continuous-valued consensus rather than discrete consensus problems. However, they're important in some contexts, particularly when dealing with sensor fusion or multi-agent systems where values are real numbers rather than discrete choices.
</extrainfo>
Summary and Key Takeaways
The landscape of agreement problems reveals several critical insights:
Multiple equivalent formulations: Reliable broadcast, consensus, and interactive consistency are related; solving one often solves others.
Bounds are tight: The $n > 3f$ bound for Byzantine failures is both necessary and sufficient in synchronous authenticated systems. These aren't loose bounds—they're exact thresholds.
Asynchrony is harder: The transition from synchronous to asynchronous systems introduces fundamental impossibility. The FLP result shows that determinism and guaranteed correctness cannot coexist in the asynchronous crash-failure model.
Randomization unlocks asynchronous consensus: While determinism fails in asynchronous settings, randomized algorithms achieve consensus with high probability, demonstrating that the impossibility is subtle rather than absolute.
Practical protocols prioritize assumptions: Paxos and Raft dominate practice because they assume synchrony and crash failures only, avoiding the complexity of Byzantine protocols. When Byzantine faults are a real concern, algorithms like Phase King are used, accepting the higher cost of $n > 3f$ and more complex protocols.
Understanding these problems and results is foundational for designing distributed systems that work reliably under stated assumptions.
Flashcards
What is the primary goal of Terminating Reliable Broadcast?
Ensuring all correct processes receive the same value, including the broadcaster's value if it is correct.
In the context of formal consensus properties, what does the Agreement property state?
All correct processes must output the same value.
How can an interactive consistency algorithm be used to solve consensus?
Each process adopts the majority value from its consensus vector.
What is the FLP impossibility result regarding asynchronous crash failures?
Consensus cannot be guaranteed in an asynchronous system with even a single possible crash failure.
How do randomized protocols address the impossibility of asynchronous consensus?
They achieve consensus with probability $1$ in the presence of crash failures.
What is the basic process for achieving Weak Interactive Consistency?
Processes start with private values and communicate in rounds to produce a public vector.
Regarding Weak Interactive Consistency, what does the Integrity property require?
If a correct process sends a value, every correct process receives that value or nothing.
Regarding Weak Interactive Consistency, what does the Consistency property require?
All messages sent by a correct process in a round are received by all correct processes in that same round.
Which problem's solution in a synchronous authenticated model yields a solution to Weak Interactive Consistency?
The Weak Byzantine Generals problem.
What is the required relationship between processes $n$ and failures $t$ to solve this problem in an anonymous synchronous model?
$n > 3t$
In the oral-messages model, under what condition is it impossible to solve consensus?
When the number of Byzantine processes $f$ satisfies $n \le 3f$.
What is the failure tolerance of the written-messages model for Byzantine failures?
It can tolerate up to $f$ failures provided $n > f$.
How many phases and rounds per phase does the Phase King Algorithm run for?
$f + 1$ phases with $2$ rounds per phase (where $f$ is the max Byzantine failures).
What happens during the first round of each phase in the Phase King Algorithm?
Every process broadcasts its current preferred value.
Who is designated as the "king" in a given phase of the Phase King Algorithm?
The process whose identifier matches the current phase number.
What is the final step of the Phase King Algorithm to achieve consensus?
After the final phase, all processes output their current preferred values.
How do Mean Subsequence Reduced (MSR) algorithms achieve consensus?
By repeatedly discarding outlier values from faulty agents.
Quiz
Consensus (computer science) - Agreement Problems and Protocols Quiz Question 1: What does the agreement property require of a consensus algorithm?
- All correct processes output the same value (correct)
- All processes decide within a fixed time bound
- All inputs are validated before processing
- All messages are signed and authenticated
Consensus (computer science) - Agreement Problems and Protocols Quiz Question 2: How many phases does the Phase King algorithm execute for binary Byzantine consensus?
- f + 1 phases (correct)
- f phases
- 2f + 1 phases
- log n phases
What does the agreement property require of a consensus algorithm?
1 of 2
Key Concepts
Consensus Mechanisms
Consensus (distributed systems)
Paxos
Raft
Randomized Consensus Algorithm
FLP Impossibility
MSR (Mean Subsequence Reduced) Algorithm
Fault Tolerance and Consistency
Reliable Broadcast
Weak Interactive Consistency
Byzantine Generals Problem
Phase King Algorithm
Term
Definitions
Term
Definition
Reliable Broadcast
A communication primitive where a sender ensures all correct processes receive the same value, and correct senders guarantee delivery of their own value.
Consensus (distributed systems)
The problem of getting all non-faulty processes to agree on a single value despite failures.
Weak Interactive Consistency
A relaxed consistency model where processes exchange private values in rounds to produce a public vector, guaranteeing integrity and consistency among correct processes.
Byzantine Generals Problem
The classic agreement problem where some participants may act arbitrarily (Byzantine faults) and the goal is to achieve coordinated action among loyal generals.
FLP Impossibility
The result proving that deterministic consensus cannot be guaranteed in an asynchronous system with even one possible crash failure.
Randomized Consensus Algorithm
A protocol that uses randomization to achieve agreement with probability 1 in settings where deterministic consensus is impossible.
Paxos
A leader‑based consensus protocol for asynchronous networks that tolerates crash failures but not Byzantine faults.
Raft
A consensus algorithm designed for understandability, using leader election and log replication to achieve crash‑fault tolerant agreement.
Phase King Algorithm
A binary Byzantine consensus protocol that proceeds in f + 1 phases, each with a designated “king” process that helps resolve disagreements.
MSR (Mean Subsequence Reduced) Algorithm
A fault‑tolerant consensus method that repeatedly discards extreme values to converge on a common decision despite faulty agents.