P2PC: A Better Distributed Transaction Protocol
If you’re working with, learning about, or have ever had any interest in real-time distributed transactions, then you have almost certainly heard about the Two-Phase Commitment protocol (2PC). That’s because 2PC is an essential part of coordinating a transaction distributed across multiple machines. There have been numerous white papers written on 2PC and its variants as people try to get faster transaction without sacrificing reliability. Today I’m going to describe a new improvement on D2PC (Dynamic Two-Phase Commitment), which is itself an optimized variant of T2PC (Tree Two-Phase Commitment), that I’m calling P2PC (Peer-to-Peer Two-Phase Commitment).
First things first. What is this 2PC garbage you keep talking about? Simply put, 2PC is about managing transactions. By transaction, we mean an action that will change the state of a system if successful and can be reverted to the original state before the transaction if it is unsuccessful and aborts. In the 2PC, we inform all the participants in the transaction that we’re going to have a distributed transaction and they need to prepare for it. The various machines and systems that will be involved in the transaction do all the necessary things to prepare (set internal state, get locks on required system resources, snapshot their state in case they need to rollback, etc.) and then reply with a ready message. When the machine coordinating the work gets replies from everyone, it will send out a commit message informing all participants that they are going forward with the transaction and that it should be finalized. If for whatever reasons one of the participants can’t process the transaction (can’t get a lock on the transaction file, is out of memory, dog ates its homework, etc.), then it sends an abort message rather than a ready message, and the controller will inform all participants that they should abort the transaction and rollback to the previous state.
T2PC & D2PC
This brings us to a point where we can discuss T2PC and D2PC. T2PC is a common implementation of 2PC in which the controller doesn’t directly communicate with all the participants in a transaction. Rather, it delegates the communication in a tree structure. As you can see in this really cool diagram.
In T2PC, the controller sends messages to its children (in this case, Delegates A and D), and they send them to their children (and they send them to their children, etc.), and then wait for the response from all their children before responding. If any node receives an abort from its children, or if the node itself needs to abort, it will reply with an abort message. Otherwise it will reply with a ready message. The controller collects the response from all its children (which response implies the state of their entire sub-tree) and then decides to commit or to abort.
D2PC builds on this idea by suggesting that the commit controller should be dynamically selected from among the various nodes of the tree as a way of increasing the throughput of the messages through the system. The idea is the same as T2PC except that as soon as a node is ready and has received a ready (or abort) message from all but one of its neighbors, it will send a ready (or abort) message onto the neighbor that hasn’t responded yet. In this way, the center of control moves toward the slowest part of the tree. If a node has received messages from all of its neighbors, when it is finally ready, it becomes the commitment controller and decides whether the transaction should be committed (assuming all the messages it received were ready messages and that it is also ready) or abort (the CC must abort if it received an abort message). For people who like diagrams, here’s a diagram.
As you can see in the diagram, because Delegate B was much slower than the other delegates in responding, Delegate A was eventually chosen as the node to decide if the transaction was to be committed. Now, there are some issues with D2PC, one of which is glare (when two nodes send a ready message to each other at the same time). It can be shown mathematically that glare can only occur at a single point for any given transaction. If you want to see the math, you have to buy Yoav Raz’s original paper on D2PC. An example of glare can be seen in the following diagram.
In this diagram, we can see that the Original Controller and Delegate A both try to send ready messages to each other at about the same time. In this case, you must go through a process to decide whether the Original Controller or Delegate A will be the commit controller, and then the commitment decision can be made and the transaction finished.
Now you may be asking yourself, why does there need to be a single commitment controller? If a node has received a message from all of its neighbors, doesn’t it have enough information to decide if the commit should be finalized or not? Why not have both nodes be controllers in the glare case? In fact, let’s do away with the need for explicit controllers altogether! If you were thinking anything like that, then great (if not, you should be). This is where P2PC comes in.
The protocol algorithm is pretty simple and is as follows. I have included the confirmations and state transitions that I left out of the other discussions for simplicity’s sake because you will need all of this information to create a fully functional implementation.
A node sends an explicit PREPARE message to its neighbors
When a node in the prepared state has received a READY message from all its neighbors but one, it sends that neighbor a READY message and enters the ready state
When a node in the prepared state that has received a READY message from all its neighbors commits, it enters the committed state and sends a COMMITTED message to all its neighbors
When a node in the ready state receives a READY or COMMITTED message from a neighbor, it commits, enters the committed state, and sends a COMMITTED message to all its neighbors
When a node in the committed state has received COMMITTED messages from all its neighbors, it enters the forgotten state
A node that encounters an error during the prepare phase sends an ABORT message to all its neighbors and enters the forgotten state
A node in the prepared or ready state that has received an ABORT message from a neighbor sends an ABORT message to all other neighbors before aborting and entering the forgotten state
Now I know what you’re saying to yourself. That’s all well and good, but what does it look like as a diagram? It looks like this.
The benefits, I think, speak for themselves, but just to be explicit, P2PC provides the following advantages:
Fastest possible abort in the abort case
Fastest possible commit when successful
No need for a central commit controller