I have a question regarding the design of these protocols. Why do we use boundary to separate parts of the multipart message instead of introducing a content-length for each part? Using length the parsing would easier. Do I miss some principal reason for using boundaries and not length parameter? Thanks!
Http/Smtp MIME multipart. Why boundary?
1.9k Views Asked by Michael AtThere are 3 best solutions below

Because in good old days the MIME Standard was defined this way. One of the reasons was probably that content-length has a problem with text/plain data, where the newline might be either CR (old mac), LF (unix) or CR LF (windows, dos). The other might be that it is easier for a human to read, which is IMHO a bad argument but happens a lot when preferring textual representations like HTTP, XML or SOAP instead of the more effective binary ways like ASN.1 or SUN RPC.
You might also view it as a successful attempt of the industry to sell more powerful servers by introducing useless overhead into the protocols :)

In addition to what DaSourcerer said in his answer, I want to point out that there are many valid reasons why one would not know the content length. In a comment, the most compelling one was named already: streaming.
Stop thinking "file size" or even "file" at all, folks! When a MIME multipart is being constructed, the creator might not even read files but input streams, or she might read from a platform-dependent reader line by line, which introduces the LF vs. CRLF problem again. The stream might be so big, it would be inefficient or even impossible to fully read it before writing it out again, just to determine its size.
Last but not least, RFC 1341 specifies that a multipart entity can have a preamble (stuff before the first boundary) and an epilogue (stuff after the last boundary). Quote:
From: Nathaniel Borenstein <[email protected]>
To: Ned Freed <[email protected]>
Subject: Sample message
MIME-Version: 1.0
Content-type: multipart/mixed; boundary="simple
boundary"
This is the preamble. It is to be ignored, though it
is a handy place for mail composers to include an
explanatory note to non-MIME compliant readers.
--simple boundary
This is implicitly typed plain ASCII text.
It does NOT end with a linebreak.
--simple boundary
Content-type: text/plain; charset=us-ascii
This is explicitly typed plain ASCII text.
It DOES end with a linebreak.
--simple boundary--
This is the epilogue. It is also to be ignored.
This is where you are wrong. The authors of multipart MIME had cases in mind where you could not determine beforehand the length of a message part. Think of content encodings that alter message lengths such as base64, UUencode and others. There's also compression, encryption and whatnot. Also:
Content-Length
is an entity header. This means if you reach it, you've already begun to parse a message part. It comes with literally no advantage over a boundary marker.If you study older protocols, you will often encounter some marker (usually
\0
) to indicate the end of a message. Sending the byte count of a message is another solution, but you won't find it a lot in places where message content has to be converted on-the-fly or is to be streamed in some fashion.Bottom line: The multipart boundary allows some interesting applications with message contents of unpredictable size. HTTP server pushing is a notable example.