Collection to store primitive ints that allows for faster contains() & ordered iteration

424 Views Asked by At

I need a space efficient collection to store a large list of primitive int(s)(with around 800,000 ints), that allows for fast operations for contains() & allows for iteration in a defined order.

Faster contains() operations to check whether an int is there in the list or not, is main priority as that is done very frequently.


I'm open to using widely used & popular 3rd party libraries like Trove, Guava & such others.

I have looked at TIntSet from Trove but I believe that would not let me define the order of iteration anyhow.

Edit:

The size of collection would be around 800,000 ints. The range of values in the collection will be from 0 to Integer.Max_VALUE. The order of iteration should be actually based on the order in which I add the value to collection or may be I just provide an ordered int[] & it should iterate in the same order.

5

There are 5 best solutions below

3
On

Just use a ArrayList<Integer>.

0
On

You can use 2 Sets one set is set based on hash (e.g. TIntSet) for fast contains operations. Another is set based on tree structure like TreeSet to iterate in speicific order.
And when you need to add int, you update both sets at the same time.

0
On

It sounds like LinkedHashSet might be what you're looking for. Internally, it maintains two structures--a HashSet and a LinkedList, allowing for both fast 'contains()' (from the former) and defined iteration order (from the latter).

1
On

As this reeks of database, but you require a more direct approach, use a memory mapped file of java.nio. Especially a self-defined ordering of 800_000 ints will not do otherwise. The contains could be realized with a BitSet in memory though, parallel to the ordering in the file.

1
On

As data structure I would choose an array of longs (which I logically treat as two ints). The high-int part (bits 63 - 32) represent the int value you add to the collection. The low-int part (bits 31 - 0) represents the index of the successor when iterating. In case of your 800.000 unique integers you need to create a long array of size 800.000.

Now you organize the array as a binary balanced tree ordered by your values. To the left the smaller values and to the right the higher values. You need two more tracking values: one int to point to the first index to start iterating at and one int to point to the index of the value inserted last.

Whenever you add a new value, reorganize your binary balanced tree and update the pointer from the last value added pointing to the currently added value (as indexes).

Wrap this values (the array and both int values) as the collection of your choice.

With this data structure you get a search performance of O(log(n)) and a memory usage of two times the size of values.