I've implemented a Pivotal Tracker API module in Python 2.7. The Pivotal Tracker API expects POST data to be an XML document and "application/xml" to be the content type.
My code uses urlib/httplib to post the document as shown:
request = urllib2.Request(self.url, xml_request.toxml('utf-8') if xml_request else None, self.headers)
obj = parse_xml(self.opener.open(request))
This yields an exception when the XML text contains non-ASCII characters:
File "/usr/lib/python2.7/httplib.py", line 951, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 809, in _send_output
msg += message_body
exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 89: ordinal not in range(128)
As near as I can see, httplib._send_output is creating an ASCII string for the message payload, presumably because it expects the data to be URL encoded (application/x-www-form-urlencoded). It works fine with application/xml as long as only ASCII characters are used.
Is there a straightforward way to post application/xml data containing non-ASCII characters or am I going to have to jump through hoops (e.g. using Twistd and a custom producer for the POST payload)?
You're mixing Unicode and bytestrings.
To fix it, make sure that
self.headerscontent is properly encoded i.e., all keys, values in theheadersshould be bytestrings:Note: character encoding of the headers has nothing to do with a character encoding of a body i.e., xml text can be encoded independently (it is just an octet stream from http message's point of view).
The same goes for
self.url—if it has theunicodetype; convert it to a bytestring (using 'ascii' character encoding).HTTP message consists of a start-line, "headers", an empty line and possibly a message-body so
self.headersis used for headers,self.urlis used for start-line (http method goes here) and probably forHosthttp header (if client is http/1.1), XML text goes to message body (as binary blob).It is always safe to use ASCII encoding for
self.url(IDNA can be used for non-ascii domain names—the result is also ASCII).Here's what rfc 7230 says about http headers character encoding:
To convert XML to a bytestring, see
application/xmlencoding condsiderations: