MQTT QoS 1 Ordering

64 Views Asked by At

I came across this answer: https://stackoverflow.com/a/38094997/403875

But I'm not clear on what it means to "re-send in the order in which the PUBLISH packets were sent".

Suppose I send, "A", "B", "C" (all with QoS 1) and get back ACK's for "A" and "C" ("B" was lost). If I then re-send "B", the receiver will receive them out of order, won't they? So what does this restriction do? When does it apply? Does the sender need to re-send "B" and "C" to ensure that "C" is received last? What if this time it gets an ACK for "B", but not "C". Does it need to now send "C" again until it gets another "C" ACK?

1

There are 1 best solutions below

0
On

Suppose I send, "A", "B", "C" (all with QoS 1) and get back ACK's for "A" and "C" ("B" was lost).

I'm going to assume that all messages are sent on the same "ordered-topic" (throughout this answer) and that the server adheres to the MQTT spec. Given this you should only receive the PUBACK for:

  • "A" followed by loss of connection, or
  • "AB" followed by loss of connection, or
  • "ABC"

because:

  • MQTT it runs over "TCP/IP, or over other network protocols that provide ordered, lossless, bi-directional connections" so the connection should deliver packets in order (or the connection should drop).
  • The spec requires that the server send "PUBACK packets in the order in which the corresponding PUBLISH packets were received".

This means you should receive the PUBACK packets in the order you sent the PUBLISH packets, or the connection should drop (and rules around resending packets come into play).

It is possible, say due to a bug, that the server does not respond in the specified order. At this point the server is breaching the protocol (so any guarantees offered by the protocol no longer apply).

But I'm not clear on what it means to "re-send in the order in which the PUBLISH packets were sent".

Say I send "ABCD" and receive an ACK for "A" followed by loss of connection. The above is simply saying that I should retransmit "BCD" in that order.

These rules may seem pretty simple but when you attempt to implement them things can become quite complex and various trade-offs need to be made. Consider a client that receives two PUBLISH messages and commits them both to a database. To speed processing we might start a separate thread to store the message and then trigger the PUBACK. However this means that the second message might complete processing first - should we send the ACK immediately or wait? (the spec indicates the PUBACK should be delayed).

What if this time it gets an ACK for "B", but not "C". Does it need to now send "C" again until it gets another "C" ACK?

I think it's worth mentioning a key difference between the v3 and v5 specs here. The V3 spec allows an unacknowledged PUBLISH to be retransmitted at any time (the spec acknowledged issues with eatly TCP implementations - "Historically retransmission of Control Packets was required to overcome data loss on some older TCP networks"). The V5 spec only permits retransmission following a reconnection.

Most servers/clients now follow the V5 rules; for example the Mosquitto change log for version 1.5 says:

Outgoing messages with QoS>1 are no longer retried after a timeout period. Messages will be retried when a client reconnects. This change in behaviour can be justified by considering when the timeout may have occurred.

  • If a connection is unreliable and has dropped, but without one end noticing, the messages will be retried on reconnection. Sending additional PUBLISH or PUBREL would not have changed anything.
  • If a client is overloaded/unable to respond/has a slow connection then sending additional PUBLISH or PUBREL would not help the client catch up. Once the backlog has cleared the client will respond. If it is not able to catch up, sending additional duplicates would not help either

What does "Ordered, lossless" mean if not "100% stable"?

The spec requires the underlying connection to provide "ordered, lossless, bi-directional connections", it does not include any requirements about the longevity of these connections. Connections will fail from time to time due to a wide variety of causes including:

  • You might need to restart the server the received is running on.
  • There may be a break in the network (e.g. loss of cell connection).
  • Your gateway may fail (meaning loss of NAT rules)

The spec details how connectivity issues should be dealt with in detail (see keep alive, Operational behavior etc).