Distributed Systems Basics
Module 1 of Blockchain Fundamentals
What Is a Distributed System?
A distributed system is a collection of independent computers that appear to users as a single coherent system.
Examples:
- The internet itself
- Google Search (millions of servers)
- Netflix (globally distributed)
- Blockchain networks (thousands of nodes)
"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." — Leslie Lamport
Why Distributed Systems Matter for Blockchain
Blockchain is fundamentally a distributed system problem:
- How do thousands of computers agree on one truth?
- How do we handle computers that fail or lie?
- How do we maintain consistency without a central authority?
Understanding distributed systems is essential to understanding blockchain.
Core Challenges
1. Partial Failures
In centralized systems, it either works or it doesn't. In distributed systems:
- Some nodes fail while others work
- Failures are often silent (no response)
- Hard to distinguish slow from dead
Node A: Working ✓
Node B: Crashed ✗
Node C: Working ✓
Node D: Slow (is it dead?)
Node E: Working ✓
2. Unreliable Networks
Networks are not reliable. Messages can be:
- Lost: Packet never arrives
- Delayed: Arrives much later
- Duplicated: Arrives multiple times
- Reordered: Arrives out of sequence
You cannot tell the difference between:
- A crashed node
- A very slow node
- A network partition
3. No Global Clock
Each computer has its own clock. Clocks drift:
- CPU clocks drift ~50ppm (50 microseconds per second)
- Over a day: ~4 seconds of drift
- Network latency adds uncertainty
Consequence: "What happened first?" is surprisingly hard to answer.
4. Byzantine Failures
Nodes might not just fail — they might actively lie or attack.
| Failure Type | Behavior |
|---|---|
| Crash failure | Node stops responding |
| Omission failure | Node drops some messages |
| Byzantine failure | Node sends arbitrary/malicious data |
Blockchain must handle Byzantine failures (the hardest kind).
The CAP Theorem
In a distributed system, you can have at most 2 of 3:
Consistency
/\
/ \
/ \
/ \
/ ?? \
/ \
/____________\
Availability Partition
Tolerance
Definitions
- Consistency (C): All nodes see the same data at the same time
- Availability (A): Every request gets a response
- Partition Tolerance (P): System works despite network splits
The Reality
Network partitions WILL happen. So you must choose:
- CP: Consistent but may be unavailable (traditional databases)
- AP: Available but may be inconsistent (many web services)
Where Does Blockchain Fit?
Bitcoin chooses eventual consistency with partition tolerance:
- During partition: chains may diverge
- After partition heals: longest chain wins
- Consistency is probabilistic, not immediate
Consensus
Consensus = Getting all nodes to agree on a single value.
Why It's Hard
The Two Generals Problem:
General A General B
| |
|---"Attack at dawn"-------->|
| |
|<---"Confirmed"-------------|
| |
|---"Got confirmation"------>|
| |
...continues forever...
Neither general can be certain the other will attack. No number of messages can fix this.
The Byzantine Generals Problem
Even worse: some generals might be traitors who send conflicting messages.
The Result (Lamport, 1982):
- With f Byzantine nodes, you need 3f+1 total nodes
- Requires 2/3 honest majority
Practical Solutions
| Algorithm | Type | Speed | Byzantine Tolerant |
|---|---|---|---|
| Paxos | Crash fault | Fast | No |
| Raft | Crash fault | Fast | No |
| PBFT | Byzantine | Slow | Yes |
| Nakamoto | Byzantine | Slow | Yes |
| Tendermint | Byzantine | Medium | Yes |
Replication Strategies
Primary-Backup
One leader handles writes, replicates to followers.
- Simple
- Single point of failure
- Not Byzantine tolerant
State Machine Replication
All nodes execute same commands in same order.
- Consistent state
- Requires consensus on ordering
- Foundation of blockchain
Blockchain Approach
- Block producers propose state transitions
- Network reaches consensus on which blocks to accept
- State machine replication without fixed leader
Timing Models
How you model time affects what's possible:
Synchronous
- Known upper bound on message delay
- Known upper bound on processing time
- Easier to design, unrealistic in practice
Asynchronous
- No timing guarantees
- Messages can be delayed arbitrarily
- FLP Impossibility: Cannot guarantee consensus with even one crash failure
Partially Synchronous
- Asynchronous, but eventually becomes synchronous
- Realistic model for internet
- Blockchain operates here
Key Distributed Systems Concepts for Blockchain
1. Eventual Consistency
Nodes may temporarily disagree, but will eventually converge.
2. Idempotency
Operations can be safely repeated (important when messages duplicate).
3. Atomic Broadcast
All nodes receive messages in the same order (what blockchain provides).
4. State Machine Replication
All nodes maintain identical state by processing identical inputs.
Key Takeaways
- Distributed systems are hard — failures and timing are unpredictable
- CAP theorem forces tradeoffs — blockchain chooses eventual consistency
- Byzantine fault tolerance is expensive — but necessary for trustless systems
- Consensus is the core problem — blockchain's main innovation
- Timing matters — blockchain works in partial synchrony model
- State machine replication — the foundation of blockchain architecture
Questions to Consider
- Why can't traditional databases solve blockchain's problem?
- What happens if more than 1/3 of nodes are malicious?
- How does Bitcoin handle network partitions?
- Why is "eventual consistency" acceptable for money?