I'm working on an implementation of a logical data diode, which means that data can only flow in one direction. No ACKs are allowed. Therefore, I have chosen UDP. This is the gist of the protocol:
- Split the payload into small chunks
- Give each chunk a sequence number
- Transmit the chunk to the receiver using UDP
For small payloads, this works flawlessly. For larger payloads, however, at around 600 datagrams sent, the packets seem to mysteriously disappear in transmit. All packets are logged as sent, but on the receiving end, the packets just stop.
Here is a code snippet:
for i in range(redundancy + 1):
seq = 0
sent_bytes = 0
bytes_to_send = len(session.encrypted_data)
while sent_bytes < bytes_to_send:
logger.info("Sending payload chunk %d", seq)
header = concat_bytes(str(seq).zfill(16).encode(), session.session_uuid.bytes_le)
remaining_room = BUFFER_SIZE - len(header)
data = session.encrypted_data[sent_bytes : sent_bytes + remaining_room]
payload = concat_bytes(header, data)
self._transmit_bytes(payload)
sent_bytes += len(data)
seq += 1
_transmit_bytes:
def _transmit_bytes(self, message: bytes):
self.server_socket.sendto(message, self.addr)
time.sleep(MESSAGE_DELAY)
The initialization of server_socket:
self.server_socket: LDDSocket = LDDSocket(listen=False)
so_linger_options = struct.pack("ii", 1, CLOSE_TIMEOUT)
send_buffer_size = 1024 * 1024 * 100 # 100 MB
self.server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, so_linger_options)
self.server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, send_buffer_size)
LDDSocket is an extremely simple wrapper for socket.socket. There is nothing you need to know about it (listen=False just makes it not bind).
As you can see, I have tried some tips that I found online, including:
- Setting SO_LINGER options (
CLOSE_TIMEOUTis 10 seconds) - Setting SO_SNDBUF options
- Before closing the socket, I'm waiting 10 seconds. I expected that I would still see packets being received during these 10 seconds, but the receiving stops well before that timeout and the subsequent
socket.close()call happen.
The last point is done in the __exit__ method of the "transmitter class", which is used as a context manager:
def __exit__(self, exc_type, exc_val, exc_tb):
cleanup_grace_period = 10 # seconds
logger.info("Waiting %d seconds for cleanup...", cleanup_grace_period)
time.sleep(cleanup_grace_period)
logger.info("Cleanup presumed to be complete; closing socket.")
self.close()
logger.info("Socket closed.")
Again, the receiving end stops receiving messages well before this __exit__ method is called.
What could cause a socket to stop sending data after a certain amount of data has been sent? (It always ends roughly around the same sequence number, but not exactly.)
My problem appears to be related to buffer sizes.
An investigation revealed:
MESSAGE_DELAYvalue to 100ms (meaning it waits 100ms after sending a packet before sending another one) caused all packets to be sent and received, but this is an unacceptable delay.MESSAGE_DELAYgradually increased the chance of the last packets (never any in the middle) being lost.SO_SNDBUF) option caused all packets to be sent (WireShark), but not received.SO_RCVBUF) option caused all packets to be received.Finally, I was able to reduce
MESSAGE_DELAYand still receive all packets.