Sinatra binary return for msgpack -- charset issue/ characters being converted somewhere?

1k Views Asked by At

I'm currently trying to return msgpack http://msgpack.org/ from a ruby sinatra service and parse it using javascript. I am using the javascript library found here: https://github.com/uupaa/msgpack.js/ (though I don't think that's relevant to this question).

I have a sinatra service that does the following using the msgpack gem:

require 'sinatra'
require 'msgpack'

get '/t' do
  content_type 'application/x-msgpack'
  { :status => 'success', :data => {:one => "two", :three => "four"}}.to_msgpack
end

I have javascript that reads it as follows:

<script src="js/jquery.js"></script>
<script src="js/msgpack.js"></script>
<script type="text/javascript">

    function r() {
        $.ajaxSetup({
            converters: {
                "text msgpack": function( packed ) {
                    if(packed != '') {
                        unpacked = msgpack.unpack(packed);
                        return unpacked;
                    }else{
                        return ''
                    }
                }
            }
        });

        $.ajax({
            type: "GET",
            url: "/t",
            dataType: "msgpack",
            success: function(data) {
                alert(data)
            }
        })  
    }
    $(document).ready(r)
</script>

The problem is that when I get the data back, many characters have been converted from their server side version to 0xfffd.

I then tried the two variants:

content_type 'application/octet-stream'

and

content_type 'application/octet_stream', :charset => 'binary'

on the server side. The former didn't change anything but the latter came closer, leaving most of the message untouched with one exception: the first character was converted from 0x82 to 0x201a.

I suspect that there is a combination of charset/ content types that would fix this that I haven't tried yet. I could also always fall back to Base64, but I'd like to understand what it takes to get it working without Base64 first.

1

There are 1 best solutions below

3
On BEST ANSWER

0x82 is LOW QUOTATION MARK in Latin1, 0x201a is the same character in UTF-16. Have a look at how your libraries deal with encoding, tell them to use a binary encoding and not try any conversion between encodings.

UTF-16 smells of JavaScript. If you use jQuery, have a look at http://blog.vjeux.com/2011/javascript/jquery-binary-ajax.html.