FlatBuffer vs Protobuf serialization performance

1.6k Views Asked by At

After migrating existing code from Protobuf (specifically: Protobuf LITE) to FlatBuffers I'm now at the situation where assessing the performance of both is important (before hopefully retiring Protobuf)...but with results not in the way I expected.

The IDL-design/schema of the message type(s) are practically the same (in fact, the first version of my FlatBuffers schema was auto-derived from the Protobuf one using the flatc compiler option --proto). The schema has a root-type of a simple table containing 3 strings and an array of another simple key-value tables (becoming a std::vector in the generated/compiled C++). That key-value table: a string key name followed by an int, float, double or string.

enum __Type : int { fb_UNKNOWN = 0, fb_INT = 1, fb_FLOAT = 2, fb_DOUBLE = 3, fb_STRING = 4 }

table __KeyValuePair (native_custom_alloc: "fb_custom_allocator")
{
   key: string;

   int_Type: __Type = fb_UNKNOWN;

   int_Value: int;
   float_Value: float;
   double_Value: double;
   string_Value: string;
}


table __TradingFloorEvent (native_custom_alloc: "fb_custom_allocator")
{
   str_Trader: string;
   str_Exchange: string;
   str_Currency: string;

   vec_KeyValuePairs: [__KeyValuePair];
}

root_type __TradingFloorEvent ;

So, to the FlatBuffers-aware folks, this is not a complicated schema. You'll also see that I've chosen a custom allocator - behind the scenes it uses the header-only boost::pool to re-use previous memory allocations: it works perfectly and has already shown impressive improvements in performance when building up the object (before serialization).

Another possibly important piece of info: I'm generating the C++ code with the --gen-object-api option (which is necessary to allow the use of the custom allocator anyway), meaning that serialization and deserialization are now achieved using FlatBuffer's Pack() and UnPack() functions.

The problem: serialization of a TradingFloorEvent that contains a lot (say, 100) of key-value pairs is disturbingly slow in comparison to Protobuf - and the more key-value pairs in the array/vector the worse it gets, sometimes 10 times slower.

FYI: I'm on MS-Windows using Visual Studio 2022 with performance assessed using "Release" builds.

My FlatBuffers "builder" is initialized with 1Mb (largest serialized buffer, so far, is just 10k), so pre-allocation of that ultimate buffer should be a given. So what could be going wrong ? Could there be expensive behind-the-scene allocations that are not evident to me, could there be other FlatBuffers schema options that might help, could my developer head be completely fried and in need of a hard reboot ?

0

There are 0 best solutions below