Sync in Cassandra data model

195 Views Asked by At

I'm creating a social network app and i've denormalized data creating the timelines as a list of posts containing all fields in order to using only one query. The issue is that if every post has counters (likes, shares,comments), everytime a counter is updated how can I sync all post record in timeline? Or should i have only id in timeline and so i update only one table of posts and timeline query is made up of two query instead of one?

Thanks in regards

2

There are 2 best solutions below

9
On

You may be able to store individual posts and then retrieve all of them using a single range query - this would allow you to easily update individual posts with counter data, but you'd still be able to retrieve all posts in the timeline using one query. See this article for an example using temperature recordings at a weather station.

CREATE TABLE posts (
  post_id text,
  post_time timestamp,
  post text,
  timeline_id set<text>,
  comments_count counter,
  PRIMARY KEY (post_id,post_time)
);

CREATE INDEX timeline_id_index ON posts (timeline_id);

SELECT post, comments_count
FROM posts
WHERE timeline_id CONTAINS ’foo’
AND post_time > ’2013-04-03 07:01:00′
AND post_time < ’2020-04-03 07:04:00′;

UPDATE posts
SET comments_count = comments_count + 1
WHERE post_id='bar';
5
On

I've been thinking about a similar task and here's my idea:

  1. Timeline data must be synced. Storing only an id in a timeline and looking up in posts is a bad idea: you wouldn't want to make 20 lookups to generate a single timeline page. Also, it won't scale: once you add more nodes to your server architecture, different posts can be stored on different nodes and this will make queries slower.

  2. It seems best is to use the counter data type for storing counters. This requires creating a separate counters-only table. See here for more info. As I understand using this type improves scalability (of updates).

  3. If writing to timeline with every new like/comment/share becomes too expensive, an in-memory cache can be employed: counters are stored in cache and looked up when timeline is generated. Timeline stored values updates could then be delayed and serve as a fallback for counters which are not in cache. Looking up 20 times from RAM for a single request is ok I think.

#3 can be useful if you count entry views. Since you're not counting views, updating timeline on every like/comment/share would be alright I think.