Synchronizing a current message id in a conversation between Alice and Bob

98 Views Asked by At

I'm faced with this situation:

Diagram

Host A and B are exchanging messages in a conversation through a broker.

When host B receives a messages it sends back a delivery token to Host A so that it can show the user that B has received his messages. This may also happen the other way around.

At any point A or B may be offline and the broker will hold on to the messages until they come online and then deliver them.

Each host stores it's own and the other hosts messages in a database table:

ID | From | To | Msg | Type | Uid 

I figured using the naive table primary key id would have been a bad choice to identify the messages (as it's dependent in order of insertion) so I defined a custom unique id field (uid).

My question is:

How can I make sure that the current message id stays synchronized between host A and B so that only one message has that id? So that I can use delivery token id to identify which message was received, and it wouldn't be possible if I had more than one message with the same Id.

If I do this naively incrementing it every time we send/receive a message at first it looks ok:

Host A sends message with ID 1 and increases it's current ID to 2
Host B receives a message and increases it's current ID to 2
Host B sends message with ID 2 and increases it's current ID to 3
Host A receives message and increases it's current ID to 3
...

But it may very easily break:

Host A sends message with ID 1 and increases it's current ID to 2
Host B sends a message (before receiving the previous one) with ID 1
clash.. two messages with ID 1 received by both hosts

I thought of generating a large UUID every time (with extremely low chance of collision) but it introduces a large overhead as every message would need both to carry and store one.

Unfortunately any solution regarding the broker is not viable because I can't touch the code of the broker.

1

There are 1 best solutions below

0
On

This is a typical problem of Distributed Systems (class exercise?). I suppose you are trying to keep the same ID in order to determine an absolute order among all messages exchanged between Alice and Bob. If this is not the case, the solution provided in the comment by john1020 should be enough. Other possibility is to have ID stored in one node that can be accessed by both A and B and a distributed locks mechanism synchronizes access. In that way, you always define an order even in face of collisions. But this is not always possible and sometimes not efficient.

Unfortunately, there is no way of keeping an absolute order (except having that unique counter with distributed locks). If you have one ID that can be modified by both A and B, you will have a problem of eventual consistency and risk of collisions. A collision is basically the problem you described.

Now, imagine both Bob and Alice send a message at the same time, both set ID in 2. What would be the order in which you would store the messages? Actually it doesn't matter, it's like the situation when two people spoke at the phone at the same time. There is a collision.

However, what is interesting is to identify messages that actually have a sequence or cause-effect: so you could keep an order between messages that are caused by other messages: Bob invites Alice to dance and Alice says yes, two messages with an order.

For keeping such order you can apply some techniques like vector clocks (based on a Leslie Lamport's timestamps vector algorithm): https://en.wikipedia.org/wiki/Vector_clock . You can also read about AWS' DynamoDB: http://the-paper-trail.org/blog/consistency-and-availability-in-amazons-dynamo/

Also you can use the same mechanism Cassandra uses for distributed counters. This is a nice description: http://www.datastax.com/wp-content/uploads/2011/07/cassandra_sf_counters.pdf