Unexplained upgrade of a string to utf-8

120 Views Asked by At

I have a web server in Perl with POE. Before the data hits the wire, the header and body are concatenated in POE::Filter::HTTPD->put. For some bizare reason, some of the headers are being promoted to utf-8, which means binary body is getting corrupted.

The probleme is that the join in headers_as_strings() is turning upgrading some headers to UTF-8 even if it shouldn't. For example, if I add in the following code, only the last line produces a warning. So a join of 3 non-utf8 strings is producing a UTF-8 string, but not for all headers. The solution is to utf8::downgrade on $ret[-1] but I want to know why this is happening

my $vnl = _process_newline( $value, $endl );
warn "$$: '$name' is utf8" if utf8::is_utf8( $name );
warn "$$: '$sep' is utf8" if utf8::is_utf8( $sep );
warn "$$: '$vnl' is utf8" if utf8::is_utf8( $vnl );
push @ret, join $sep, $name, $vnl;
# only this last line produces a warning
warn "$$: the join has utf8 " if utf8::is_utf8( $ret[-1] );
1

There are 1 best solutions below

6
On

The short answer is that Perl will upgrade a string to utf-8 without warning. I was using a MIME::Type object that I thought was a string. MIME::Types opens it's DB with open DB, '<:encoding(utf8)'.

But the real WTF is that POE::Driver::SysRW->flush has use bytes; before syswrite() and that's when the data gets jumbled.