Background
Two Phase Commit (abbreviated 2PC) is a protocol used to achieve atomic writes in distributed systems. It was a novel concept in the 1970's and had good intentions, but in practice the implementations are not too great. We'll dig into what 2PC does and why a lot of systems in production decide not to use it.
How 2PC works
As the name states, participant nodes using 2PC protocol go through two phases.
- Prepare
- Commit
A Coordinator service is a transaction manager that communicates with participant nodes. The Coordinator service usually lives with the application server or on a single node machine. When the Coordinator is given a transaction (one or more atomic operations) to execute, it starts the Prepare phase by asking all the nodes if they are truly prepared for the upcoming transaction commit.
At this stage, if any of the participants say "No", or they crash, then the transaction is aborted altogether. When the Coordinator is about to start a certain phase (i.e. Prepare, Commit), it will record this state change onto the transaction log, which is also on disk. This is to provide a recovery mechanism in the case of failures such as system restarts on the Coordinator host.
In the Prepare phase, if all participants happened to acknowledge or say "OK" to the the "Prepare" request by the Coordinator, only then will the Coordinator proceed to the following Commit phase. Since the participants have acknowledged this step, they fulfill a promise to commit the transaction.
The Coordinator now sends the Commit request to all participants, just like it did in the Prepare phase. However, unlike the Prepare phase, the Coordinator expects each participant to commit the transaction and waits until it does so. This means that each of the nodes must commit successfully, no matter how slow they are or what bad state they might be in. The Coordinator will keep retrying the requests until it gets an acknowledgement by the participants.
The simple explanation for why this is would be that if some nodes had committed a transaction successfully while others didn't, it creates a globally inconsistent state in the entire system. The transactions that have been committed in successful nodes cannot be rolled back because the commit was successful. Thus, a transaction in a distributed system means that each node must also commit the transaction successfully.
The problems of 2PC
The statement highlighted in bold above gives you a hint of some major flaws in 2PC. In more detail:
- In the 2PC Commit phase, the Coordinator will pretty much be blocked until every node in the system acknowledges the transaction write. If one node happens to be laggy or restarts due to some error, the distributed write as a whole is blocked until the issue is resolved.
- In the scenario above, if one node happens to die while in the Commit phase, the Coordinator would be stuck in limbo. A system administrator must resolve this issue manually.
- If the Coordinator node itself dies at the Commit step, then all the nodes will be in-doubt, causing them to standby (and not take any further transactions) until the Coordinator node is operational again.
- If the Coordinator node restarts, then it will be operational after some time but will likely incur huge downtime
- If the Coordinator node is completely dead, then a system administrator must replace the node entirely and restore from the backup log - even longer downtime
- A failover for the Coordinator node is recommended for the reasons above
So about 2PC...
In general, 2PC was not built around the idea of being fault-tolerance at all. In distributed systems, network partitioning is a side effect that has to be handled. Thus, better solutions for atomic, distributed writes are centered around consensus algorithms that have the ability to detect failed nodes. Zab and Raft are a couple of good examples.