Assuming i have a large table that i want to store on s3 and my access pattern is simply to retrieve set records for a given set of keys. The record can be stored as json for csv.
So far, i have observed 2 patterns in my research, (1) the big data framework approach, which all partition the table in separate files that contains multiple records each, (2) using s3 as a key value store pure and simple, where each row correspond to a key.
Granted the big data framework, envision access pattern that are more involve such as enabling full sql query support.
Nonetheless, i wonder, is partitioning tabular data into separate files of multiple row, more efficient than simply storing it as key/value, when it comes to object storage like S3 ? By efficient, i consider both cost and speed, but primarily speed with respect to the simple access pattern i mentioned above.