I was going through the video, which talks about the column-oriented database and explains the concepts really well. But, what I didn't get from video is, how does the column-oriented database stores data into the disk? or
How does Cassandra stores data in the disk? I have read a similar question: Why many refer to Cassandra as a Column oriented database?, but the accepted answer or any answer doesn't tell how the data is stored in disk.
I understand the benefits and what exactly is the column-oriented database. Each column data are stored in a separate file in the disk. Assuming something like below.
Say, table name: CarOwner
and Primary Key: Id
Row Oriented: Each record is stored together in the disk, probably in the same block.
Id | Name | Car | Age
----------------------------------
1 | John | BMW | 34
2 | Terry | Audi | 31
3 | Josh | Tesla | 24
4 | Dan | Ford | 50
I row store lookup is straight forward. You can do scan in each clock for matching filter or if the query key is indexed fetch the corresponding block after referring the index.
Column Oriented Structure in Disk: "Assumption"
Name File:- say starts at block block1 in disk
John
Terry
Josh
Dan
Similarly, Car and Age are stored in separate files.
So If I want to fetch name and car for a given ID, do we maintain and refer a file like below?
1 : {Name: block1-offset1, Car: block4-offset1,...}
2 : {Name: block1-offset4, Car: block4-offset3,...}
3 : {Name: block1-offset7, Car: block4-offset5,...}
Is this how lookup for a key works in the column store? If that's correct what are the other ways to store?
How does Cassandra store its data? as it is categorized as the column-oriented DB too.
Cassandra stores data on disk in SSTables along with other files mentioned in documentation. When querying data by key (assuming it's not in memtable) it checks the index file, which points to the position of searched data in SSTable file.