MongoDB 2.4.8 capped collection and tailable cursor consuming all memory

2.2k Views Asked by At

We are currently exploring Capped Collections and Tailable Cursors within MongoDB to create a queueing system for notifications. However, after creating a simple LinqPad test (code below) we noticed when running, Mongo constantly allocates memory until there are no more resources available, even though we are not inserting any records. This allocation continues until all system RAM is used, at which point Mongo simply stops responding.

As we are new to Capped Collections and Tailable Cursors, I wanted to ensure we havent missed something obvious before submitting a bug.

Note: We tried the code below with journaling on and off with the same results.

  • Platform: Windows Server 2012 64bit
  • MongoDB: Version 2.4.8 64bit
  • Driver: Official C# 10gen v1.8.3.9

Linqpad script

var conn = new MongoClient("mongodb://the.server.url").GetServer().GetDatabase("TestDB");

if(!conn.CollectionExists("Queue")) {

    conn.CreateCollection("Queue", CollectionOptions
        .SetCapped(true)
        .SetMaxSize(100000)
        .SetMaxDocuments(100)
    );

    //Insert an empty document as without this 'cursor.IsDead' is always true
    var coll = conn.GetCollection("Queue");
    coll.Insert(
        new BsonDocument(new Dictionary<string, object> {
            { "PROCESSED", true },
        }), WriteConcern.Unacknowledged
    );
}

var coll = conn.GetCollection("Queue");
var query = coll.Find(Query.EQ("PROCESSED", false))
    .SetFlags(QueryFlags.AwaitData | QueryFlags.NoCursorTimeout | QueryFlags.TailableCursor);

var cursor = new MongoCursorEnumerator<BsonDocument>(query);

while(true) {
    if(cursor.MoveNext()) {
        string.Format(
            "{0:yyyy-MM-dd HH:mm:ss} - {1}",
            cursor.Current["Date"].ToUniversalTime(),
            cursor.Current["X"].AsString
        ).Dump();

        coll.Update(
            Query.EQ("_id", cursor.Current["_id"]),
            Update.Set("PROCESSED", true),
            WriteConcern.Unacknowledged
        );
    } else if(cursor.IsDead) {
        "DONE".Dump();
        break;
    }
}
2

There are 2 best solutions below

0
On BEST ANSWER

It seems I found the solution to the problem!!

The issue in the above code revolves around the query:

Query.EQ("PROCESSED", false)

When I removed this and replaced it with a query based on the id of the document, the memory consumption problem disappeared. On further reflection, this "PROCESSED" property really isnt required in the query as cursor.MoveNext() will always return the next new document (if there is one). Heres the refactored LinqPad script based on the above code....

var conn = new MongoClient("mongodb://the.server.url").GetServer().GetDatabase("TestDB");

if(conn.CollectionExists("Queue")) {
    conn.DropCollection("Queue");
}

conn.CreateCollection("Queue", CollectionOptions
    .SetCapped(true)
    .SetMaxSize(100000)
    .SetMaxDocuments(100)
    .SetAutoIndexId(true)
);

//Insert an empty document as without this 'cursor.IsDead' is always true
var coll = conn.GetCollection("Queue");
coll.Insert(
    new BsonDocument(new Dictionary<string, object> {
        { "PROCESSED", true },
        { "Date", DateTime.UtcNow },
        { "X", "test" }
    }), WriteConcern.Unacknowledged
);

//Create query based on latest document id
BsonValue lastId = BsonMinKey.Value;
var query = coll.Find(Query.GT("_id", lastId))
    .SetFlags(QueryFlags.AwaitData | QueryFlags.NoCursorTimeout | QueryFlags.TailableCursor);

var cursor = new MongoCursorEnumerator<BsonDocument>(query);

while(true) {
    if(cursor.MoveNext()) {
        string.Format(
            "{0:yyyy-MM-dd HH:mm:ss} - {1}",
            cursor.Current["Date"].ToUniversalTime(),
            cursor.Current["X"].AsString
        ).Dump();
    } else if(cursor.IsDead) {
        "DONE".Dump();
        break;
    }
}
0
On

Same here - without that additional query.

After some more investigation (in fact VERY MUCH MORE) i found the problem looks like this:

If the first MoveNext does not return a record the problem exists. It doesn't matter what kind the query is. It doesn't matter how many entries are in the collection.

If you change the query returning the last entry as the first result everything works fine. You may discard this as you know this already...

The upper example succeeds becaus you get initially ALL records already in the collection.