I need to find the object quickly while using as little memory as possible. What data container should I use?

137 Views Asked by At

My program need insert more than millions of record to the data container. I tried hashmap and treemap. Both will give me Heap space exception to me although I allow the JVM to use 2gb ram.

My program frequently gets specific data from the container which I think if it takes O(logn) time will be acceptable to me. So what container should I use? Or I need to implement one? How?

More Details: the key is String, like a global ID e.g "00011123459" sth like that. Then the key will map to a list of list i.e List<List<String>>. My program readLine from the file, then change the line into list, then get the global id from the list, then put the list into corresponding list of list. The file has more than millions of line, that's why I believe the main reason is I create too much lists. However, I cannot add more memory to the machine.

4

There are 4 best solutions below

1
On

From the javadoc.

This implementation provides guaranteed log(n) time cost for 
the containsKey, get, put and remove operations.

So use a TreeMap and give Java more memory.

1
On

If u have more infrastructure support , try to to increase Memory to 4 or 5 gb and use any of these maps

  1. Use tree map - if u want your objects to be sorted.Since objects are sorted extra time is taken to sort entire map after inserting new object.

  2. Use Hash map - for fast Additions/retrievals as objects are not sorted.

0
On

HashMap takes less memory than a TreeMap and is O(1).

If your keys are numbers, you can save memory with TLongObjectHashMap from Trove4j.

Another options is to persist your data temporarily on disk with MapDB.

You could also apply caching with CacheBuilder in Guava: What happens when a collection in Java increases beyond capacity?

0
On

Presuming that the vast majority of memory usage is due to the record data itself, it may be the case that no choice of container will solve your problem (as a test, attempt to load all of your data into an array; if you run out of memory, you'll need another solution). Not only that, but if you are cutting it that close to capacity, you will still have issues if you encounter a larger number of records in the future.

Outside of adding more RAM, there are many other approaches you can take, but the general idea is to store more on disk and less on memory. Here are a few possible alternatives:

  • Store your records in a proper database (many options here, SQLite may be the most convenient for you -- many options for access as well, ranging from straight java.sql.* to Hibernate).
  • Use something like MapDB, as Andrey Chaschev mentioned.
  • If your program frequently accesses a small subset of data, or accesses the same data consecutively, consider leaving the records on disk, finding them when needed, and caching them when found (only searching on disk if the record of interest is not in the cache).
  • Instead of storing entire records in your map, perhaps store some information that helps you find them on disk faster and lazily load records as needed (e.g. store the file offset of record data in your map, then on lookup, load the actual record data from the file, implement caching if desired).

Personally, I'd go for the first option (be sure to create an index on the keys you typically use to find records) as it is very straightforward to set up and use, and SQLite (for example) is self-contained and requires no server. At the cost of added development complexity, you can still cache data if you find that your performance requirements are not being met, or something like Hibernate will do it for you.