Consensus (computer science) - Core Consensus Foundations
Understand the definition of consensus, its protocol requirements and fault models, and the concepts of validity and t‑resilience.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz
Quick Practice
What is the primary requirement of the consensus problem for a collection of processes?
1 of 12
Summary
Understanding the Consensus Problem
What is Consensus?
In distributed computing, consensus is a fundamental problem where multiple independent processes must coordinate to agree on a single data value. Each process starts with its own proposed input value, and the challenge is to design a protocol that makes them all reach agreement despite the inherent difficulties of distributed systems: process failures, unreliable communication, and the lack of a global clock.
Think of it like a committee where members are located in different cities and need to decide on a single proposal. Each member starts with a preference, but they must eventually all agree on the same final decision, even if some members fail or act deceptively.
The Three Core Requirements
A valid consensus protocol must satisfy three essential properties:
Non-triviality. The agreed-upon output value must be the input of at least one correct process. This prevents the trivial solution where all processes simply output a default value like "0" without actually considering anyone's input. The protocol must meaningfully reflect the participants' inputs.
Agreement and Irrevocability. Each process may decide on an output value only once, and once a decision is made, it cannot be changed or reversed. This "irrevocable" property is crucial: processes cannot change their mind after deciding, which prevents flip-flopping and ensures stability in the final result. From this requirement, we also get that all correct processes must decide on the same value.
Termination. All correct processes must eventually reach a decision and output their chosen value. A protocol that runs forever without concluding, or that leaves some processes waiting indefinitely, does not solve consensus. Termination is particularly challenging when failures can occur, since the protocol must complete even when some processes crash or behave unpredictably.
Understanding Fault Models
Before a consensus protocol can be evaluated, we must define what kinds of failures can occur. Different fault models lead to different problem difficulties and protocol designs.
Crash Failures. A process experiencing a crash failure stops executing permanently and never resumes. Once a process crashes, it simply disappears from the computation—it stops sending and receiving messages, ceases processing, and takes no further action. This is the more optimistic failure model because a crashed process becomes predictable: we know it will do nothing. Other processes can eventually recognize that it has crashed through timeout mechanisms.
Byzantine Failures. A Byzantine failure is far more severe. A process experiencing this type of failure can behave completely arbitrarily: it might send conflicting messages to different processes, delay messages, send invalid data, ignore its protocol, or coordinate with other faulty processes. The term "Byzantine" comes from the famous Byzantine Generals Problem, where the challenge is reaching agreement even when some generals are traitors who actively try to disrupt the consensus. Byzantine failures are much harder to handle because faulty processes appear unpredictable and can actively work against the agreement goal.
The key insight: crash failures are passive (bad things stop happening), while Byzantine failures are active (adversarial behaviors continue happening).
Two Types of Validity Conditions
Consensus protocols differ in exactly what values are "valid" to output. Two formal definitions capture different security levels:
Weak Validity. Under weak validity, the output of each correct process must equal the input value of some correct process (not necessarily the same correct process for all agreeing parties). This condition is weaker because it allows flexibility—as long as the agreed value came from at least one correct process, it is valid. However, it permits scenarios where correct processes might output the input of one correct process even if most processes proposed different values.
Strong Validity. Under strong validity, if all correct processes start with the same input value, then they must output that exact value. This is a much stronger guarantee: consensus cannot "flip" to a different value if everyone was already in agreement. Strong validity prevents protocols from ignoring unanimous agreement among correct processes. Every consensus protocol satisfying weak validity does not automatically satisfy strong validity—strong validity is an additional constraint that excludes certain protocol designs.
Example: Suppose all correct processes propose the value "Yes".
A protocol with only weak validity might still output "No" (because a Byzantine process proposed "No"), which seems wrong when everyone agreed.
A protocol with strong validity must output "Yes" in this case, respecting the unanimous input.
Resilience and Fault Tolerance Bounds
Once we define what kinds of failures can occur, we can measure a protocol's robustness using the concept of resilience.
A protocol is called t-resilient (or tolerates t faults) if it correctly solves the consensus problem despite the simultaneous failure of up to t processes. Here, t is the maximum number of faulty processes the protocol can handle while still satisfying all three requirements (non-triviality, agreement, and termination).
The value of t is crucial because it determines the practical applicability of a protocol. For example:
A protocol that is 1-resilient can handle one crashed or Byzantine process
A protocol that is 5-resilient can handle up to five failures simultaneously
A protocol that requires $n$ correct processes (where $n$ is the total) to reach agreement is 0-resilient—it fails if even one process fails
The challenge lies in understanding the fundamental limits: given n total processes, what is the maximum t for which consensus is solvable? For crash failures, consensus is solvable when $t < n/2$, but for Byzantine failures, the requirement is even stricter: $t < n/3$. These limits come from impossibility results that prove no protocol can do better.
Flashcards
What is the primary requirement of the consensus problem for a collection of processes?
To agree on a single data value.
How do processes initiate the consensus process?
By proposing an input value.
What are the three core requirements for a consensus protocol?
Non-triviality (The output must be the input of at least one correct process)
Irrevocability (A process decides only once and cannot change it)
Termination (All correct processes must eventually decide)
What does the non-triviality requirement dictate regarding the consensus output value?
It must be the input of at least one correct process.
What is the restriction on a process's decision once an output value is chosen?
The decision is irrevocable and can only happen once.
What does the termination requirement ensure in a consensus protocol?
That all correct processes eventually reach a decision.
How is a crash failure defined in a distributed system?
A process stops permanently and never resumes.
What characterizes a Byzantine failure in a process?
Arbitrary behavior, including sending conflicting messages.
What is required for weak validity in a consensus protocol?
Each correct process's output must equal the input of some correct process.
What is required for strong validity if all correct processes start with the same input?
They must output that specific value.
What does it mean for a protocol to be $t$-resilient?
It works correctly despite up to $t$ faulty processes.
In the context of $t$-resilience, what does the variable $t$ represent?
The maximum number of failures the protocol can tolerate.
Quiz
Consensus (computer science) - Core Consensus Foundations Quiz Question 1: What does the consensus problem require of a set of processes?
- They must agree on a single data value (correct)
- Each process must output a unique value
- Processes may skip deciding entirely
- They must exchange messages forever without deciding
Consensus (computer science) - Core Consensus Foundations Quiz Question 2: What characterizes a crash failure in a distributed system?
- The process stops permanently and never resumes (correct)
- The process sends conflicting messages to different processes
- The process slows down but continues operating
- The process restarts automatically after a timeout
Consensus (computer science) - Core Consensus Foundations Quiz Question 3: Under strong validity, what must happen if all correct processes start with the same input value?
- All correct processes must output that same value (correct)
- Each correct process may output any input from a correct process
- Processes can output any arbitrary value
- They must output a default consensus value
Consensus (computer science) - Core Consensus Foundations Quiz Question 4: What does it mean for a consensus protocol to be t‑resilient?
- It works correctly despite up to t faulty processes (correct)
- It can tolerate exactly t crash failures and no more
- It guarantees termination after t communication rounds
- It requires at least t correct processes to decide
Consensus (computer science) - Core Consensus Foundations Quiz Question 5: What does the irrevocability requirement state about a process’s decision in a consensus protocol?
- A process decides only once and cannot change that decision (correct)
- A process may revise its decision if it receives new proposals
- A process can decide multiple times as long as all decisions match
- A process’s decision is provisional until all others have decided
What does the consensus problem require of a set of processes?
1 of 5
Key Concepts
Consensus Fundamentals
Consensus problem
Consensus protocol
Termination (distributed computing)
Failure Models
Crash failure
Byzantine failure
Consensus Properties
Weak validity
Strong validity
t‑resilience
Definitions
Consensus problem
The challenge of getting a set of distributed processes to agree on a single data value despite possible failures.
Consensus protocol
An algorithm that ensures processes reach agreement while satisfying properties such as termination and validity.
Crash failure
A fault model where a process stops functioning permanently and never recovers.
Byzantine failure
A fault model allowing a process to behave arbitrarily, including sending conflicting or deceptive messages.
Weak validity
A consensus property requiring each correct process’s decision to be the input of some correct process.
Strong validity
A consensus property requiring that if all correct processes start with the same input, they must decide on that value.
t‑resilience
The ability of a consensus protocol to tolerate up to t faulty processes while still guaranteeing correctness.
Termination (distributed computing)
The guarantee that every correct process eventually reaches a decision in a consensus protocol.