C# Retrieve an HTTP POST chunked response by chunk?

997 Views Asked by At

I am working with an API that accepts an HTTP POST request along with URL-form-encoded query data as the payload. The response from the server is chunked, and each chunk contains a single JSON object. I'm trying to read and parse the server's response by chunk from C#, so that I can take the chunk, convert it with Newtonsoft, and do whatever processing I need at that point. The server returns an unknown nubmer of records per query - it could be 0 records, or it could be thousands of chunks.

My research and testing into the typical solutions like HttpClient indicate that these libraries "handle" chunks by just concatenating everything into a single response stream. Furthermore, I've read other posts that indicate that if a server isn't following the spec 100%, it's possible to even get an exception at the end of the stream.

I've considered the following solutions, but none seem optimal:

  1. Read the stream from the HTTP response char-by-char, counting { and } characters to find the start and end of a JSON object. Every time a closing } is found, parse the object. This is incredibly ugly, inefficient, and not generic - it assumes every JSON response is an object and wuold need to be altered if, for example, the server sends a JSON array ([ and ]) instead, or even just single JSON strings per chunk.

  2. Skip HttpRequest/HttpClient entirely and do everything in raw sockets. Then, I can parse the chunk sizes, read exactly that many bytes from the socket stream, and parse accordingly. This would work, except it feels like a lot of "reinventing the wheel" since I have to implement URL encoding for the POST body, header parsing, SSL/TLS, etc. This has all been basically "solved" by HttpClient, so implementing it myself again feels like a bad idea, if for no other reason than I could easily introduce a parsing bug.

  3. Since the server sends a JSON object per chunk, read the entire response, then look for }{ and consider those to be the split point for JSON objects (since in actual JSON there would be a , between two objects that were part of a list). This feels unreliable at best - it assumes there is no whitespace on either side of each chunk's JSON object. This is also inefficient because, if the server were to return millions of records, the entire response would need to be stored in RAM. A response with millions of records could be over 1GB in size total, across hundreds of chunks. While that's not necessarily a problem for a machine with plenty of RAM, it's an unnecessarily inefficient method for parsing data that is streamable by design.

The ideal scenario is some sort of enumerator that reads the HTTP stream by chunk, since the API is producing chunks wherein each chunk represents exactly one JSON object. This is what I considered implementing in option 2, but again, seems like a lot of reinventing the wheel and potential for serious bugs being introduced. The second best option would be a way to get the raw, underlying socket stream from the HttpClient after it has performed the request and parsed the headers -- in other words, a way to get the stream that includes the chunk sizes and separators, so then I can parse that stream directly, extracting the chunk sizes, and basically doing #2 above but without having to write my own HTTP implementation.

What is the best option for me to implement this functionality?

0

There are 0 best solutions below