What are the differences between Object Storages for example S3 and a columnar based Technology

249 Views Asked by At

I was thinking about the difference between those two approches.

Imagine you must handle information about pattern calls, which later should be displayed to the user. A pattern call is a tuple consisting of a unique integer identifier ("id"), a user defined name (“name"), a project relative path to the so called pattern file ("patternFile") and a convenience flag, which states whether the pattern should be called or not called. And the number of tuples are not known before and they won't be modified after initialization.

I thought that in this case a column based approach with big query for example would be better in terms of I/O and performance as well as the evolution of the schema. But actually I can't understand why. I would appreciate any help.

1

There are 1 best solutions below

1
On

Amazon S3 is like a large key-value store. The Key is the filename (with full path) and the Value is the contents of the file. It's just a blob of data.

A columnar data store organizes data in such a way that specific data can be "jumped to", and only desired values need to be read from disk.

If you are wanting to perform a search on the data, then some form of logic is required on the data. This could be done by storing data in a database (typically a proprietary format) or by using a columnar storage format such as Parquet and ORC plus a query engine that understands this format (eg Amazon Athena).

The difference between S3 and columnar data stores is like the difference between a disk drive and an Oracle database.