How does cassandra calculates the size of partitioning key and clustering key . We have tables with with relatively large partitioning keys (UUID and combination of UUID) along with large clustering key for example
mydb/parent/6E219A7E21044B48B8816B931925CCDB/child1/29E6E709854D49CFAC72ECD5E1AEBFA3/ mydb/parent/6E219A7E21044B48B8816B931925CCDB/child2/29E6E709854D49CFAC72ECD5E1AEBFA4/ mydb/parent/6E219A7E21044B48B8816B931925CCDB/child3/29E6E709854D49CFAC72ECD5E1AEBFA5/
here PK - 6E219A7E21044B48B8816B931925CCDB Clustering Column is - /child1/29E6E709854D49CFAC72ECD5E1AEBFA3/
We have child level upon nth level (right now we are doing till 100 level)
Now does having large keys have performance impact when we have huge data ~300 million , also what will be impact on disk usage
Having large partition key or clustering key is not a issue. It has no impact on performance.
Only thing you should avoid is having large partitions. For example in your case, you have 100 rows in a single partition. So if the size of all rows combined is within 10MB (
Ideal size of a Cassandra partition is equal to or lower than 10MB with a maximum of 100MB.), then you are doing fine. You can refer this link for calculating your partition size.If your partition size is large, then you have to refine your data model so as to reduce your partition size. Following are some of the techniques generally applied for reducing the partition size