I am new to Cassandra and while reading about partitioning a database - vertical and horizontal, I got confused and would like to know whether Cassandra follows Horizontal partitioning (sharding) OR vertical partitioning technique?
Moreover, according to my understanding, as Cassandra is column oriented DB, it should follow Vertical partitioning technique. If this is not the case then can anyone please explain it in detail?
This point has been discussed ad-nauseam on Stack Overflow, specifically in this answer. Cassandra is NOT a column oriented database. It is a partitioned row store. Data is organized and presented in "rows," similar to a relational database.
Technically, Cassandra is what you would call a "sharded" database, but it's almost never referred to in this way. Essentially, each node is responsible for a specific range of partitions. These partitions (tokens) are a numeric value, and with the Murmur3Partitioner range from -2^63 to +2^63-1.
In fact, in a scenario where a node is simplified to hold a single token range, you can compute the ranges based on the number of nodes in the cluster (data center) like this:
Of course with vNodes, a node is almost always responsible for multiple token ranges.
At operation time, the partition key is hashed into a token. This token tells Cassandra which node the data resides on. Consider this table:
As this table has a simple primary key definition of
studentid
, that is used as the partition key. The results of thetoken(studentid)
function above indicate which partitions contain the data.If there was another table which also used
studentid
as its partition key, that table's data would be stored on the same nodes as thestudent
table.In any case, this is a simplified version of what happens. Feel free to read up on vNodes (link above) as well as Cassandra: High Availability by Robbie Strickland. He has written (IMO) the best description of Cassandra's hashing and partition distribution process.