Cassandra follows which partitioning technique?

Question

Cassandra follows which partitioning technique?

992 Views Asked by binpatel31 At 17 August 2025 at 18:18

I am new to Cassandra and while reading about partitioning a database - vertical and horizontal, I got confused and would like to know whether Cassandra follows Horizontal partitioning (sharding) OR vertical partitioning technique?

Moreover, according to my understanding, as Cassandra is column oriented DB, it should follow Vertical partitioning technique. If this is not the case then can anyone please explain it in detail?

Original Q&A

There are 2 best solutions below

Jim Wartnick On 22 May 2020 at 14:40

Cassandra implements partitioning on a hashing algorithm. Because of that Cassandra allows for efficient horizontal scaling (if the partition key is chosen correctly). In summary, when you create a table, you define the partitioning column(s). When you insert a record, Cassandra will take the values, hash it, and determine the node it belongs on. If you have RF configured > 1, then alternate replicas will also be chosen. How it works is no different than Oracle's hash partitions, except Oracle only does it at the storage layer, not the host layer (unless you're using Oracle sharding).

**Aaron** · Accepted Answer

as Cassandra is column oriented DB

This point has been discussed ad-nauseam on Stack Overflow, specifically in this answer. Cassandra is NOT a column oriented database. It is a partitioned row store. Data is organized and presented in "rows," similar to a relational database.

whether Cassandra follows Horizontal partitioning (sharding)

Technically, Cassandra is what you would call a "sharded" database, but it's almost never referred to in this way. Essentially, each node is responsible for a specific range of partitions. These partitions (tokens) are a numeric value, and with the Murmur3Partitioner range from -2^63 to +2^63-1.

In fact, in a scenario where a node is simplified to hold a single token range, you can compute the ranges based on the number of nodes in the cluster (data center) like this:

python -c 'print [str(((2**64 / 6) * i) - 2**63) for i in range(6)]'

['-9223372036854775808', '-6148914691236517206', '-3074457345618258604',
 '-2', '3074457345618258600', '6148914691236517202']

Of course with vNodes, a node is almost always responsible for multiple token ranges.

At operation time, the partition key is hashed into a token. This token tells Cassandra which node the data resides on. Consider this table:

SELECT token(studentid),studentid,fname,lname FROM student ;

 system.token(studentid) | studentid | fname | lname
-------------------------+-----------+-------+----------
    -5626264886876159064 | janderson | Jordy | Anderson
    -1472930629430174260 |   aploetz | Avery |   Ploetz
     8993000853088610283 |      mgin | Micah |      Gin

(3 rows)

As this table has a simple primary key definition of studentid, that is used as the partition key. The results of the token(studentid) function above indicate which partitions contain the data.

If there was another table which also used studentid as its partition key, that table's data would be stored on the same nodes as the student table.

In any case, this is a simplified version of what happens. Feel free to read up on vNodes (link above) as well as Cassandra: High Availability by Robbie Strickland. He has written (IMO) the best description of Cassandra's hashing and partition distribution process.

Cassandra follows which partitioning technique?

There are 2 best solutions below

Related Questions in DATABASE

Related Questions in CASSANDRA

Related Questions in PARTITIONING

Related Questions in SHARDING

Related Questions in COLUMN-ORIENTED

Trending Questions

Popular # Hahtags

Popular Questions