I'm having difficulties when parsing a raw HTTP request string and trying to confirm the calculated content-length value.
The issue was found when parsing a POST request containing multipart data. In this case, the Content-length header's value differs from the one I calculate with len(rfile.read). I guess this has to be somehow related to character encoding on the binary content, but I did not figure out a way to get the same result as the one provided by the Content-length HTTP header.
The following script demonstrates the issue:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from BaseHTTPServer import BaseHTTPRequestHandler
from StringIO import StringIO
str_http = """POST /abc HTTP/1.1
Host: 127.0.0.1:80
Connection: close
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.18.4
content-type: text/plain
Content-Length: 7
abc 123
"""
str2_http = """POST /spi/v2/events HTTP/1.1
User-Agent: Crashlytics Android SDK/1.3.8.127
X-CRASHLYTICS-DEVELOPER-TOKEN: XXXXXXXXXXX
X-CRASHLYTICS-API-CLIENT-TYPE: android
X-CRASHLYTICS-API-CLIENT-VERSION: 1.3.8.127
X-CRASHLYTICS-API-KEY: XXXXXXXXXXX
Content-Type: multipart/form-data; boundary=00content0boundary00
Host: e.crashlytics.com
Connection: Keep-Alive
Accept-Encoding: gzip
Content-Length: 776
--00content0boundary00
Content-Disposition: form-data; name="session_analytics_file_0"; filename="sa_32e4d6c3-adef-4cd5-a571-a68e4bee65f6_1542460750651.tap"
Content-Type: application/vnd.crashlytics.android.events
�Ko�0dz�p��@Z�\��3~do
�B-Uw˩�v�ռ�Gi��͡��.�r@�U�
b�R43{��i,����'~���σ`o�)Tu��/Mnߘpꪈ�j�U}e�D�4M�L��+���U�:�ƌ�
D����b� &�-5U����T��{]�R���,(K%K@�lS��{�f�ux�ʁ��`�;w���f�(}���R������[����ﴠ9�U� � К"I�A��I* ���^վ�M���᳃XĒ`�]�^:m+cs��ˋ��X����._���uӬj�Wr�|]o�{�e� ~`>R�`��G�
�QQ�3��19� e�s]����d�ΥS.oÙ���ܥ�U���s՞G��6�ζ�rm�������nB��[Cp^�'^{�CnGvI�w�����p�H�#;HÄ�Z��������=����c5�����$۶ ��%��c@i�U,
�p����������G�
s�����8�����aC
--00content0boundary00--
"""
class HTTPRequest(BaseHTTPRequestHandler):
def __init__(self, request_text):
self.rfile = StringIO(request_text)
# ~ self.rfile = io.BytesIO(request_text)
self.raw_requestline = self.rfile.readline()
self.error_code = self.error_message = None
self.parse_request()
self.request_text = request_text
def send_error(self, code, message):
self.error_code = code
self.error_message = message
# 1st test : OK
request = HTTPRequest(str_http)
content_length = int(request.headers.get('content-length') or 0)
data_cursor = request.rfile.tell()
print request.rfile.read(content_length) # Complete body
request.rfile.seek(data_cursor)
print len(request.rfile.read().strip()) == content_length # True
# 2nd test : KO
request = HTTPRequest(str2_http)
content_length = int(request.headers.get('content-length') or 0)
data_cursor = request.rfile.tell()
print request.rfile.read(content_length) # Truncated body
request.rfile.seek(data_cursor)
print len(request.rfile.read().strip()) == content_length # False
Do you know how to properly calculate this value? My goal is to be able to calculate the proper data length in order to precisely know when a request has finished delivering its content.
If other people are stuck on a similar issue, I finally managed to figure out how to make it work. As suspected in my previous post, it all comes down to encoding: non-ASCII chars have to be hex-encoded (e.g.: \xff).
The following code shows a working sample with a problematic string: