Mongo bulk replace with 200k+ operations

366 Views Asked by At

E.g. I have such documents in the collection:

{
    "key": "key1",
    "time": 1000,
    "values": [] // this one is optional
}

I need to update the collection from, let's say, CSV file by modifying or removing values column and where key & time are filters .

What I've tried so far:

  • DeleteMany(with or(and(key: key1), and(time: time2)), ... 276k more or arguments) + InsertMany with 276k documents => ~ 90 seconds
  • Bulk ReplaceOne with (filter: and(key: key1, time: time2)) => ~ 40 seconds
  • Split huge bulk into several smaller batches (7500 seems to be the most performant), but this one is not atomic in terms of db operation => ~ 35 seconds

Notes:

  • All tests were with bulk.ordered = false to improve performance.
  • There is unique index key: 1, time: -1

Is there a possibility to optimize such kind of request? I know Mongo can burst to ~80k inserts/s, but what about replacements?

1

There are 1 best solutions below

5
On

Bulk operations are not atomic as the submitted group. Only individual operations are atomic. Note also that the driver will split bulk operations into smaller batches automatically if you submit more than a certain number (1,000 when encryption is not used) which is why huge batches tend to perform worse than batches of under one thousand.

To answer your question on performance:

  • Create a test deployment using tmpfs for storage.
  • Find out how many queries/second this deployment can sustain.
  • Find out how many updates/second this deployment can sustain.
  • If the number of updates/second is about half of the number of queries/second, you are probably operating at the maximum efficiency.

You are going to have lower performance using SSD and magnetic disk backing storage, naturally. The idea with the memory test is to ensure you are using the database as efficiently as possible.

Especially with a mixed read and write workload, if you are using a magnetic disk, switching to SSD storage should yield a noticeable performance gain.