Gossip Protocols, Where to Start
This week I started reading about Gossip Protocols, or Epidemic Protocols as they are sometimes called as well. It’s quite an interesting topic in distributed systems and as you might guess, once you start investigating, the rabbit hole goes on and on.
Gossip protocols were initially used as a way to maintain consistency on databases that were replicated at several hundreds sites. From then on it was seen that the gossiping could be used to solve other problems, like calculating averages across a network of nodes; or as a way to build an overlay of nodes in a network. Maintaining node membership is another problem that’s been tackled with gossiping.
These protocols usually work like this:
- A node in the network randomly selects a peer with which it will exchange some information.
- A data exchange process happens between these peers.
- Each node processes the data it received.
- These steps are periodically repeated by the nodes in the network as a way to disseminate information.
These protocols work in a similar fashion as how a disease disseminates in a population. Nodes are classified as being infective, susceptible and removed. An infective node will try to spread some information by periodically selecting a random peer from the network. If the peer is susceptible, that is, it doesn’t know about said information, then it will become infected, and thus will also start to try to spread this particular information to other nodes. A removed node is one that knows about the new piece of information that circulating around, but it’s not spreading it (for example if all the neighboring nodes already know the information, there’s no need to keep spreading it).
This can also be seen in the light of how rumour is spread. One person A first hears a rumour, then calls over the phone someone person B in order to share the rumour. Once they hung the phone, B calls a third one, let’s say C, while at the same time A contacts D to share the rumour as well. The process continues on and on until everyone learns about the rumour. As you can see, this technique can be used to spread data pretty fast among a network of processes.
The analysis of these algorithms focuses on designing strategies on how to best select the peer to share information with. For this problem, there are mathematical models that prove that using this or that peer selection technique, the state of the system will converge in this or that direction after M rounds. In these systems, information is usually spread in O(log(N)) steps, where N is the number of peers in the network.
If you want to get started with this topic, here I recommend some papers that are quite fundamental when it comes to understanding how gossiping works:
This paper by Alan Demers et al. is considered to be the seminal paper that introduced gossiping/epidemic algorithms in our industry. It’s a very interesting read that also shows the problems they were trying to solve in distributed systems back in the ’80s.
This paper by Anne-Marie Kermarrec and Maarten van Steen tries to give an overview and build a framework that would help us better understand and analyse gossiping protocols. Highly recommended as a kind of “foundations” reading.
Before drinking the Gossiping Kool Aid, it’s recommended to read this paper by Ken Birman, in which he describes some of the limitations of gossip protocols.
I hope you find this topic interesting and that these papers server you as a base to get you started.