Parsing data from untrusted Java serialized object

762 Views Asked by At

I need to parse untrusted Java serialized objects. The data is given to me as a byte array (written at some point by ObjectOutputStream).

I do not want to simply call ObjectInputStream.readObject() and/or load the actual object. I am looking for a way to safely parse the bytes and grab field names & values.

--

Here's a little summary of my attempt so far, after taking a look at the ObjectInputStream procedure for deserializing objects.

I have tried to extract field types/names (as unicode strings) recursively based on expected stream constants. I end up with a list of field names whose values should appear in the byte array in order. I am uneasy about this approach because it is probably buggy. Especially accommodating for what seems to be individual serialization protocols followed by HashMap, ArrayList, etc. But it might work, if I can figure out a way to read the bytes that represent field values:

I can try to read and store primitives based on size/offset, but when I encounter my first object, it gets a bit more complicated -- there is no clear way to distinguish between which bytes are associated with which values anymore (without actually loading the object in the way that ObjectInputStream probably does?).

--

Can anyone suggest either a potential solution that I'm obviously looking past, or a trusted library that can help parse the serialized data without loading objects?

Thank you for reading, and for all comments/suggestions!!! I apologize if something is unclear and I would be happy to clarify if you bear with me.

1

There are 1 best solutions below

2
user207421 On

You can't do this in principle. Any Java class can take over its own Serialization and write arbitrary data to the stream that only it knows how to parse and reconstruct, via code that is only invoked during deserialization.