Social Activity Feed - Best Approach for Per-User Capped Collections in MongoDB?

407 Views Asked by At

I am working on a social activity feed system very similar to the 10Gen Socialite Project which has been running in production for a couple of years now. I have a new use case wherein I need to store a chronologically ordered list of activities per user, wherein the list of activities should:

  1. only contain the most recently inserted N items
  2. not insert duplicates of semantically-equivalent items
  3. allow for paging through results.

So far, I've come up with two approaches to solving this but both seem have troubling limitations.

The first approach (which closely matches my other collections) is to have a single collection containing one document for each activity, indexed by user id. For example:

{
    "owner": {
      "type": "user",
      "id" : "1234"
    },
    "activity": {
        "published": "2013-09-27T17:08:26+00:00",
        "actor": {
            "type": "elastic-search-node",
            "id": "2"
        },
        "verb": "recommend",
        "object": {
            "type": "review",
            "id": "1093773"
        }
        "uuid": "6d70eaa4-0766-4949-971d-98740cb9eca1"
    }
}

Each time I receive a new activity for a given user, I insert a document as above with the same 'owner' clause but a different 'activity' clause. However, I'm not sure of the most efficient way to handle my inserts. Given the criteria above one pseudocode approach would be:

results = collection.update(
  {
    'owner.id':'1234', 
    'activity.verb':'recommend',
    'activity.object.type':'review',
    'activity.object.id':'1093773'
  },
  the_activity,
  upsert:true)

# count documents for owner.id = 1234
# if count > max_documents, delete oldest document

The problem with this approach is that it can take up to 3 database operations to complete the insert and prune. However, using 'upsert' takes care of preventing duplicates and we can use the generated ObjectID for temporal queries and pagination.

Another approach I've looked at is similar to the FanoutOnWriteSizedBuckets approach in Socialite. In this case the list of activities is stored in max-size array as a sub-document, indexed by user id. For example:

{
    "owner" : {"type":"user", "id":"1234"},
    "feed" : [
        {"_id" : ObjectId("...da7"), "activity" : ...},
        {"_id" : ObjectId("...dc1"), "activity" : ...},
        {"_id" : ObjectId("...dd2"), "activity" : ...}
    ]
}

In this case, the queries are fairly straightforward as well but again, the inserts are problematic. I've looked at using various techniques and combinations of $update, $push, $addToSet, $ne, $each, etc. but none seem to be able to accomplish the prevention of duplicate inserts and prune operations in a more efficient way than above.

Can anyone suggest an approach for solving this use case?

Thanks!

(x-posted to mongodb-user Google Group) SOLVED: https://groups.google.com/forum/#!topic/mongodb-user/K8n7Gf1nv3Q

0

There are 0 best solutions below