Flatbuffers vs CBOR

8.2k Views Asked by At

Please help to suggest some merits and demerits of Flatbuffers and CBOR protocols. Both these binary formats claim to be good on their websites, but I am not able to make some good differences between the two.

Flatbuffers:

Advantage:

  1. Strict typing in FlatBuffer, Cap’n proto and other similar solutions is seen as major key point for performance since no additional encoding/decoding is necessary.
  2. The data model allows simple offsetting of typed objects with a compact data structure and fast access
  3. FlatBuffers does not need a parsing/ unpacking step to a secondary representation before you can access data often coupled with per-object memory allocation.

Disadvantage:

  1. New and not standardized like CBOR.

CBOR

Advantage:

  1. Can create and process entirely in stream with no extra memory
  2. Don’t have to pre-define any schema as our data is dynamic and variant
  3. It’s an open international standard from the IETF makes it a even better choice than a proprietary one.
  4. It’s designed for low memory, non-conversion, stream-based processing while also providing extensions for other data types

Disadvantage:

  1. CBOR says that it follows the JSON model (so not strictly typed objects)
  2. It starts with the same types of objects (strings, integers, maps, etc.).

PS:
It feels like managing types in CBOR will be performance costly compared to flatbuffers, but as CBOR is standardized protocol I am inclined to prefer it if this difference is not huge. Please let me know which of two will you all recommend and why.

3

There are 3 best solutions below

0
On BEST ANSWER

I think you've already spelled it out quite clearly yourself. FlatBuffer's strength is being able to access the data without parsing/unpacking/allocation, which can give serious performance benefits in some scenarios. But if this doesn't matter to you, e.g. Protocol Buffers may work just as well.

Strong typing vs dynamic typing in data matters a lot too. I'd only use the latter if I wanted generic data storage with no constraints ahead of time.

Btw, if for some reason you prefer dynamic typing, but would also like to have the performance benefits of in-place access, there is actually a format that combines the two: https://google.github.io/flatbuffers/flexbuffers.html

FlatBuffers is not "proprietary". It may have been designed at Google, but it is open source and relied upon by many other companies.

0
On

I chose CBOR for my site https://kwippe.com - we use it to store all of the artwork and keyword data as compressed strings within a very small JSON structure, only a few attributes necessary to categorize the file. So the files are very small, and load very fast. I used this for over 30,000 SVG files, which I converted to JSON beforehand. All of the JSON is converted to string and compressed via a string compression library, then saved as part of the smaller JSON object that I encode to CBOR.

I've had very few problems with this CBOR system, and it was far easier to set up than FlatBuffers and some of the other binary solutions that I looked at.

0
On

I had this same question and went with CBOR for a couple reasons.

You have a CON that CBOR like JSON doesn't have strict types, true, you'll need to do a little validation to make sure the type you got is one you expected. You're right, this is what a Schema serializer gets you. You lose flexibility of changing types, but you know what you're going to get. I work on embedded in C, and static typing is important.

What you didn't list as a PRO is that CBOR 'can' retain JSON compatibility. That any valid JSON is valid CBOR, but not the other way around. A cbor can have a map item (object, key/value pair) of 1 : 2 that's integer 1 name has the value of integer 2. This isn't great a practice but there could be some uses for it. If you avoid the intentionally incompatible things, CBOR >> JSON conversion can be very handy. When would you use that? Well, I use it for logs. When my CBOR packets hit our server, they are converted to JSON and stored away already human readable for analytics. You can do this with any serializer, but we felt there was a lot less chance for "interpretation" differences in the close conversion.

The main factor for us was the schema was too difficult to share, and synchronize. If you own both sides of an A to B system, a schema is great! You get size efficiency because the map "Apples" : 100 is just stored as [1,100] but you had to get your schema file on both sides and compiled in (if using code generation) before you could get any work done. Ok, but what if you have 10 sides in a star pattern A B C D E F G H I J, where A and J can get messages to each other, B and H almost exclusively chat except for a message that goes to E and never back from, etc... In this scenario a schema can be very difficult! Maybe it's working and you add a whole slew of messages the option is to have old schemas, optional or missing definitions, or you synchronize everyone. For us this was the case and it would have taken place over 4 languages and in systems we didn't own.

Instead, we chose schemaless CBOR and appropriately name each map item. "apples" is for A,B,C, and J. "bananas" is an item that will go to C, H and E but never F, etc. Each side needs to know what it should expect and that's all.

As I understand it, FlatBuffers does have a schema-less mode, but I know little about it. I don't think there is a right answer, but for what it's worth, our web developers took to and understood CBOR right away because it's so similar in look and feel to JSON.

UPDATE: If interested in CBOR, but could really use some schema support and/or a clear way to document what the expected data is. CDDL (RFC 8610) looks to do exactly this. Also supports data definition of JSON because of how similar CBOR and JSON can be. There are also CDDL code generation tools for various languages that will accept the CDDL file, and help generate code for deserializing, parsing, validating the CBOR/JSON data. For me, this was the largest pain point of not having a schema, I was left to do this work and make mistakes on my own.