How to decode protobuf binary response

35.8k Views Asked by At

I have created a test app that can recognize some image using Goggle Goggles. It works for me, but I receive binary protobuf response. I have no proto-files, just binary response. How can I get data from it? (Have sent some image with bottle of bear and got the next response):

A
TuborgLogo9 HoaniText���;�)b���2d8e991bff16229f6"�
+TR=T=AQBd6Cl4Kd8:X=OqSEi:S=_rSozFBgfKt5d9b0
+TR=T=6rLQxKE2xdA:X=OqSEi:S=gd6Aqb28X0ltBU9V
+TR=T=uGPf9zJDWe0:X=OqSEi:S=32zTfdIOdI6kuUTa
+TR=T=RLkVoGVd92I:X=OqSEi:S=P7yOhvSAOQW6SRHN
+TR=T=J1FMvNmcyMk:X=OqSEi:S=5Z631_rd2ijo_iuf�

need to get string "Tuborg" and if possible type - "Logo"

3

There are 3 best solutions below

0
On

I'm going to assume the real question is how to decode protobufs and not how to read binary from the wire using Java.

The answer to your question can be found here

Briefly, on the wire, protobufs are encoded as 3-tuples of <key,type,value>, where:

  • the key is the field number assigned to the field in the .proto schema
  • the type is one of <Varint, int32, length-delimited, start-group, end-group,int64. It contains just enough information to decode the value of the 3-tuple, namely it tells you how long the value is.
2
On
UnknownFieldSet.parseFrom(msg).toString()

This will show you the top level fields. Unfortunately it can't know the exact details of field types. long/int/bool/enum etc are all encoded as Varint and all look the same. Strings, byte-arrays and sub-messages are length-delimited and are also indistinguishable.

Some useful details here: https://github.com/dcodeIO/protobuf.js/wiki/How-to-reverse-engineer-a-buffer-by-hand

If you follow the code in the UnknownFieldSet.mergeFrom() you'll see how you could try decode sub-messages and falling back to strings if that fails - but it's not going to be very reliable.

There are 2 spare values for the wiretype in the protocol - it would have been really helpful if google had used one of these to denote sub-messages. (And the other for null values perhaps.)

Here's some very crude rushed code which attempts to produce a something useful for diagnostics. It guesses at the data types and in the case of strings and sub-messages it will print both alternatives in some cases. Please don't trust any values it prints:

public static String decodeProto(byte[] data, boolean singleLine) throws IOException {
    return decodeProto(ByteString.copyFrom(data), 0, singleLine);
}

public static String decodeProto(ByteString data, int depth, boolean singleLine) throws IOException {
    final CodedInputStream input = CodedInputStream.newInstance(data.asReadOnlyByteBuffer());
    return decodeProtoInput(input, depth, singleLine);
}

private static String decodeProtoInput(CodedInputStream input, int depth, boolean singleLine) throws IOException {
    StringBuilder s = new StringBuilder("{ ");
    boolean foundFields = false;
    while (true) {
        final int tag = input.readTag();
        int type = WireFormat.getTagWireType(tag);
        if (tag == 0 || type == WireFormat.WIRETYPE_END_GROUP) {
            break;
        }
        foundFields = true;
        protoNewline(depth, s, singleLine);

        final int number = WireFormat.getTagFieldNumber(tag);
        s.append(number).append(": ");

        switch (type) {
            case WireFormat.WIRETYPE_VARINT:
                s.append(input.readInt64());
                break;
            case WireFormat.WIRETYPE_FIXED64:
                s.append(Double.longBitsToDouble(input.readFixed64()));
                break;
            case WireFormat.WIRETYPE_LENGTH_DELIMITED:
                ByteString data = input.readBytes();
                try {
                    String submessage = decodeProto(data, depth + 1, singleLine);
                    if (data.size() < 30) {
                        boolean probablyString = true;
                        String str = new String(data.toByteArray(), Charsets.UTF_8);
                        for (char c : str.toCharArray()) {
                            if (c < '\n') {
                                probablyString = false;
                                break;
                            }
                        }
                        if (probablyString) {
                            s.append("\"").append(str).append("\" ");
                        }
                    }
                    s.append(submessage);
                } catch (IOException e) {
                    s.append('"').append(new String(data.toByteArray())).append('"');
                }
            break;
            case WireFormat.WIRETYPE_START_GROUP:
                s.append(decodeProtoInput(input, depth + 1, singleLine));
                break;
            case WireFormat.WIRETYPE_FIXED32:
                s.append(Float.intBitsToFloat(input.readFixed32()));
                break;
            default:
                throw new InvalidProtocolBufferException("Invalid wire type");
        }

    }
    if (foundFields) {
        protoNewline(depth - 1, s, singleLine);
    }
    return s.append('}').toString();
}

private static void protoNewline(int depth, StringBuilder s, boolean noNewline) {
    if (noNewline) {
        s.append(" ");
        return;
    }
    s.append('\n');
    for (int i = 0; i <= depth; i++) {
        s.append(INDENT);
    }
}
1
On

You can decode with protoc:

protoc --decode_raw < msg.bin