Using MapDB efficiently (confused about commits)

3.4k Views Asked by At

I'm using MapDB in a project that deals with billions of Objects that need to be mapped/queued. I don't need any kind of persistence after the program finishes (the MapDB databases are all temporary). I want the program to run as fast as possible, but I'm confused about MapDB's commit() function (which I assume is relevant to performance), even after reading the docs. My questions:

  1. What exactly does commit do? My working understanding is that it serializes Objects from the heap to disk, thus freeing heap space. Is this accurate?

  2. What happens to the references to Objects that were just committed? Do they get cleaned up by GC, or do they somehow 'reference' an Object on disk (with MapDB making this transparent?)

Ultimately I want to know how to use MapDB as efficiently as I can, but I can't do that without knowing what commit() is for. I'd appreciate any other advice that you might have for using MapDB efficiently.

2

There are 2 best solutions below

1
On BEST ANSWER

The commit operation is an operation on transactions, just as you would find in a database system. MapDB implements transactions, so commit is effectively 'make the changes I've made to this DB permanent and visible to other users of it'. The complimentary operation is rollback, which discards all of the changes you've made within the current transaction. Commit doesn't (directly) affect what is in memory and what is not. You might want to look at compact() instead, if you're trying to reclaim heap space.

For your second question, if you're holding a strong reference to an object then you continue holding that strong reference. MapDB isn't going to delete it for you. You should think of MapDB as a normal Java Map, most of the time. When you call get, MapDB hides whether it's in memory or on disk from you and just returns you a usable reference to the retrieved object. That retrieved object will hang around in memory until it becomes garbage, just like anything else.

0
On

It is a good idea to try to commit not after every single change to a map you make, but instead do it on some sort of schedule.

like

  • Every N changes
  • Every M seconds
  • After some sort of logical checkpoints in your code.

Doing too many commits will make your application very slow.