protocol buffers, What is in serialized data?

1k Views Asked by At

I am new to protocol buffers, and really want to know more about it, so sorry for noob question.

What is in serialized data, only values or both keys and values? I think there is only values, and if someone wants to deserialize it, he/she must has scheme.

2

There are 2 best solutions below

1
On BEST ANSWER

it's both key & value:

As you know, a protocol buffer message is a series of key-value pairs. The binary version of a message just uses the field's number as the key – the name and declared type for each field can only be determined on the decoding end by referencing the message type's definition (i.e. the .proto file). https://developers.google.com/protocol-buffers/docs/encoding

For eg, say you have a proto file as:

$  cat my.proto 
message header {
  required uint32 u1 = 1;
  required uint32 u2 = 2;
  optional uint32 u3 = 3 [default=0];
  optional bool   b1 = 4 [default=true];
  optional string s1 = 5;
  optional uint32 u4 = 6;
  optional uint32 u5 = 7;
  optional string s2 = 9;
  optional string s3   = 10; 
  optional uint32 u6 = 8;
}

Dump out encoded data from memory:

(gdb) x/10xb 0x7fd70db7e964
0x7fd70db7e964: 0x08    0xff    0xff    0x01    0x10    0x08    0x40    0xf7
0x7fd70db7e96c: 0xd4    0x38

Decode:

$ echo 08ffff01100840f7d438 | xxd -r -p | protoc --decode_raw
1: 32767
2: 8
8: 928375

1,2,8 are keys

from proto file above:

1 => u1, 
2 => u2,
8 => u6

So, it becomes:

u1: 32767
u2: 8
u6: 928375

I used data from my question here:

0
On

This depends a little on whether you use the binary form (which is typically the default when dealing with protobuf), or the json form (yes, protobuf includes a json option, at least in some libraries - not all).

In the binary form, the data consists of the field numbers, and the values; not the field names. As an example, if we use the example of:

optional string name = 1; // remove the "optional" if using proto3 syntax

and assign a value of "Nika" (and serialize it), then the binary data will include the 1 (in a slightly tweaked form), and the UTF-8 encoded form of Nika, but it will not contain "name".

You don't absolutely need to have the schema to decode it, but it will make things a lot easier if you do, as many parts of the specification are otherwise ambiguous, using the same "wire type" (i.e. the encoding format) for multiple data types, or for multiple meanings of the same data type (for example: you can't tell whether an integer is signed, unsigned, or "zig zag encoded" without the schema (or a good guess), and the actual value that you get can vary hugely based on this.

To see what you can grok from raw protobuf data without the schema, try: https://protogen.marcgravell.com/decode