Proper Chunked Transfer Encoding Format

7.6k Views Asked by At

I'm curious about the proper format of chunked data in comparison to the spec and what Twitter returns from their activity stream.

When using curl to try to get a Chunked stream from Twitter, curl reports:

~$ curl -v https://stream.twitter.com/1/statuses/sample.json?delimited=length -u ...:...
< HTTP/1.1 200 OK
< Content-Type: application/json
< Transfer-Encoding: chunked
<
1984
{"place":null,"text":...
1984
{"place":null,"text":...
1984
{"place":null,"text":...

I've written a chunked data emitter based upon the Wikipedia info and the HTTP spec (essentially: \r\n\r\n), and my result looks like this:

~$ curl -vN http://localhost:7080/stream
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=UTF-8
< Transfer-Encoding: chunked
< 
{"foo":{"bar":...
{"foo":{"bar":...
{"foo":{"bar":...

The difference being that it appears that Twitter is including the length of the string as part of the body of the chunk as an integer (in conjunction with the value in Hex that must also be there), and I wanted to make sure that I wasn't missing something. The Twitter docs make no mention of the length value, it's not in their examples, nor do I see anything about it in the spec.

2

There are 2 best solutions below

2
On

If your code does not emit length information that it is clearly incorrect. See http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.3.6.1.

0
On

RCF2616-19.4.6 Introduction of Transfer-Encoding

A process for decoding the "chunked" transfer-coding (section 3.6) can be represented in pseudo-code as:
   length := 0
   read chunk-size, chunk-extension (if any) and CRLF
   while (chunk-size > 0) {
      read chunk-data and CRLF
      append chunk-data to entity-body
      length := length + chunk-size
      read chunk-size and CRLF
   }
   read entity-header
   while (entity-header not empty) {
      append entity-header to existing header fields
      read entity-header
   }
   Content-Length := length
   Remove "chunked" from Transfer-Encoding

As RFC says, the chunk-size will not append to the entity-body. So that is nomal you can not see the chunk-size.And I have read the souce code of curl(function Curl_httpchunk_read)and make sure it skips the chunk-size\r\n, just append chunk-size bytes behind it to body.

The twitter replys with chunk-size,I think it is because of using https, the whole data is encrypted.