Fast Inserts Into Ordered Arrays

923 Views Asked by At

I am planning to implement binary search using arrays - however I also need to support dynamic updates / inserts into the array. The best approaches I've heard so far are in O(n) territory because all elements to the right of the insert point needs to be shifted to the right by one.

I pondered about this idea: I redefine / intercept (or "overload" depending on your language) the insertion method - if insertion into index 10 is detected for example, I "delay" this insertion, I keep the new element on a second list. Nothing is inserted in the real array, but the algorithm remembers the insertion point. Same for other insertions. Then if I detect an access to element 15, which I also intercept, I recalculate, offset the index by 1 to the right, and give the element 16 because there was an insertion into a location to the left of i=15.

Once in a while I would empty the second list by truly inserting them in the real list, possibly through insertion sort.

This sounds to me like it could work fast.

Would this be workable?

Thanks,

3

There are 3 best solutions below

0
On BEST ANSWER

I'm pretty sure there's nothing really good in the direction you're looking. I've done things like that in the past, but only when there was a known access pattern that made it efficient.

If you want to keep some of the efficiencies of array storage and binary search, while still allowing dynamic inserts, a B+ tree is a good tradeoff:

https://en.wikipedia.org/wiki/B%2B_tree

In a B+ tree, each node has an array of keys. You choose the max length -- typically anywhere from 16 to 4096. The array storage limits waste because you don't have a whole bunch of extra links, helps cache locality during searches, and provides a high fanout that limits depth of the tree.

1
On

"Would this be workable?": maybe, but it is unlikely.

For every insertion, you will have to lookup the list of pending insertions and determine the correction to the insertion point. And you will have to insert the new key in the pending list. You are back to the original problem of insertion in an ordered array, with a smaller array.

If you count the insertions, assuming pending lists of maximum size L, you will perform O(N) of them, for a total cost of O(N.L). So you need L to be better than O(1).

If you count the merges, you will perform N/L of them, with a cost N/L.(N + L). Then you need L to be Ω(N).

These two requirements do not seem compatible.


Be sure that the problem of insertion in a sorted list has been thoroughly investigated before you (probably before you were born), and no suitable solution based on a pending list has been adopted.

On the opposite, balanced search trees are universally used for the purpose of dynamic sorting, and they are known to be optimal, with a bound of O(Log N) for search, insertion and deletion operations.

12
On

You did not clarify some critical information about your idea such as how your algorithm remembers the insertion points of the newly inserted values, how binary search handles the new(i.e. delayed) insertions, and how you handle if there are N of such insertions.

In any case, you would either have to merge the delayed inserts with the original array at some point, which would take O(N) time, or you would have to run binary search taking into account both the original array and the waiting list of new inserts. And considering dynamic inserts could cause inserting an element between two elements that are themselves in that secondary array, I am not sure how you could even do that in a systematic manner. You are likely to compromise from the access cost to any element of A, which is O(1) in a regular array, and in turn may end up having a binary search that is less efficient than O(logN). So, a high-level overview of the incomplete definition of your idea seems not likely to achieve the desired results in terms of complexity.

By its definition binary search works on an ordered container and arrays storing each element consecutively) makes impossible any attempts to insert at any position and then rearrange the order in less than O(N) in the worst case.

Although this is not a formal proof, intuitively, you may think of this case. Whenever you have to insert to an array A an element that is smaller than all the elements present in A, for a future binary search operation to be able to work on A, you have to insert it in the beginning, in order to keep A ordered. And when you insert such a small element in the first position of A, you have to shift all the present elements in A to the right by 1, which is O(N), if size of A is considered to be N. (See similar procedure in the insertion step of the well-known algorithm insertion sort)

So, unless you change your preference about using an array, you are stuck with that O(N) time complexity bound when inserting to it. However, if you consider using a balanced binary search tree (e.g. Red-black tree, AVL tree) you may achieve faster average-time complexity for insertion and you do not sacrifice your efficiency of searching for an element. Granted, in that case, the algorithm you are running to find an element will be more of a tree traversal than a binary search. Still, if more efficient search and insertion(and removal) operations are more important to you than the specific algorithms and data structures you use, then making this change in your data structure and algorithm preferences should be tolerable.