Why marshaling can serialize circular referenced list and json can't?

535 Views Asked by At

Here I have a circular referenced list

2.1.9 :082 > a = []
 => [] 
2.1.9 :083 > a.append(a)
 => [[...]] 

When trying to dump a as json, I get the error

a.to_json
ActiveSupport::JSON::Encoding::CircularReferenceError: object references itself

but when i try to Marshal them i get a valid string

2.1.9 :085 > Marshal.dump(a)
 => "\x04\b[\x06@\x00" 

I just tried to ensure that they properly dumped the value by loading them again

 b = Marshal.load("\x04\b[\x06@\x00")
 => [[...]] 

Here are some more validations to ensure they properly dumped the object to string

2.1.9 :088 > a.object_id
 => 70257482733700 
2.1.9 :089 > a.first.object_id
 => 70257482733700 
2.1.9 :090 > b.object_id
 => 70257501553000 
2.1.9 :091 > b.first.object_id
 => 70257501553000 
2.1.9 :092 > 

In my understanding both of them are converting an object to string and get the object back from the string. I also able to see that json don't have any construct to refer other part of the json which may be the reason why it can't support this kind of operation. But is it that difficult to introduce such construct in json to facilitate the current situation. I may be missing something more fundamental regarding marshaling and serialization please enlighten me.

2

There are 2 best solutions below

0
On BEST ANSWER

In my understanding both of them are converting an object to string and get the object back from the string.

Yes. That is pretty much the definition of "serialization" or "marshaling".

I also able to see that json don't have any construct to refer other part of the json which may be the reason why it can't support this kind of operation.

Yes, that is the reason.

But is it that difficult to introduce such construct in json to facilitate the current situation.

You cannot introduce constructs in JSON. It was deliberately designed to have no version number, so that it can never, ever be changed.

Of course, this only means that we cannot add it now, but could Doug Crockford have added it from the beginning, back when he was designing JSON? Yes, of course. But he didn't. JSON was deliberately designed to be simple (bold emphasis mine):

JSON is not a document format. It is not a markup language. It is not even a general serialization format in that it does not have a direct representation for cyclical structures […]

See, for example, YAML, a superset of JSON, which has references and thus can represent cyclical data.

0
On

Both Marshal.dump and to_json return a String, but that's about everything they have in common.

to_json

to_json returns a String describing the Ruby object according to JSON specifications.

to_json needs to be monkey patched on basically every possible Ruby object, and when called on an Array, it is called recursively on every element :

"[#{map { |value| ActiveSupport::JSON.encode(value, options) } * ','}]"

This recursion is the reason why you get :

ActiveSupport::JSON::Encoding::CircularReferenceError: object references itself

If the export is successful, a JSON String written on an old JRuby on Rails server or a PHP server could be read by a new Rubinius script.

dump

Marshal.dump returns a byte stream, representing the object itself, and how it is stored internally by Ruby :

Marshal.dump(a).bytes
#=> [4, 8, 91, 6, 64, 0]
Marshal.dump([[]]).bytes
#=> [4, 8, 91, 6, 91, 0]
Marshal.dump([]).bytes
#=> [4, 8, 91, 0]

So Marshal.dump stores a as it has been defined : a one element array, referencing itself.

The first two bytes are major and minor version numbers. When comparing dumped objects with the same versions, you could ignore them with :

Marshal.dump(a).bytes.drop(2)
#=> [91, 6, 64, 0]
Marshal.dump([[]]).bytes.drop(2)
#=> [91, 6, 91, 0]

Since the representation is dependant on the Ruby implementation, dumping from one Ruby script to another might not always work.

From the doc :

In normal use, marshaling can only load data written with the same major version number and an equal or lower minor version number.