How to properly read an HTTP Post message segmented into two TCP segments?

1.1k Views Asked by At

When I execute the following Python code on a pcap file:

if tcp.dport == 80:    
   try:
      http=dpkt.http.Request(tcp.data)
   except (dpkt.dpkt.NeedData):
      continue
   except (dpkt.dpkt.UnpackError):
      continue
if http.method == 'POST':
   print('POST Message')

Packets such as the following ones create a problem: enter image description here

These are a single HTTP Post message segmented into two TCP segments and each one is sent in a different packet. However, because the first segment is a TCP only and the second one is recognised as HTTP, it seems that when dpkt.http.Request tries to read the first segment as HTTP it fails.

So far no problem. It is OK to fail as it is not really a full HTTP message. However, the issue is that it does not seem to be reading the second segment at all ("POST Message" is not printed)!!! The second segment is totally ignored as if it does not exist!!! The only possible explanation for that is that dpkt automatically reads the second segment at once as it recognises they both are segments for the same message.

The issue is that, though both TCP segments are read at once (following the above assumption), the resulted tcp.data is not recognised as an HTTP packet, rather it is still recognised as TCP only because the first segment of the message is a TCP only packet.

So what shall I do to read the HTTP header and data of such pcap file?

3

There are 3 best solutions below

1
On BEST ANSWER

dpkt only works at the packet level. dpkt.http.Request expects the full HTTP request as input and not only the part in the current packet. This means you have to collect the input from all packets belonging to the connection, i.e. reassembling the TCP data stream.

Reassembling is not simply concatenating packets but also making sure that there are no lost packets, no duplicates and that the packets gets reassembled in the proper order which might not be the order on the wire. Essentially you need to do everything which the OS kernel would do before putting the extracted payload into a socket buffer.

For some example how parts of this can be done see Follow HTTP Stream (with decompression). Note that the example there blindly assumes that the packets are already in order, complete and without duplicates - and assumption which is not guaranteed in real life.

0
On

Perhaps a bit late. The points raised by @steffen Ullrich are correct. However, assuming you do not have those issues (i.e no lost, duplicate pkts, etc), you can do some rudimentary reassembly like I did for reassembling TLS frames spread over multiple TLS packets. You can apply similar logic to HTTP traffic. You can find my solution in my SO question I posted for a similar issue related to TLS frame reassembly. BTW, in my solution I was using scapy.

0
On

Or you could use the builtin feature in Scapy 2.4.3+ https://scapy.readthedocs.io/en/latest/layers/http.html

sniff(session=TCPSession, [...])