What is the difference between [start/2 + mid/2] and [(start + mid)/2] in binary search?

222 Views Asked by At

In the binary search algorithm, we set the mid as:

mid = (start + end)/2 which is same as

mid = start/2 + end/2 and also equal to

mid = start + (end - start)/2

but all of the three give different results being the same arithmetic expression. How do these vary during the computation process?

This was the code to find the last occurrence of an element in a vector array using binary search:

int lastOccurrence(vector<int> arr, int size, int key){
    int start = 0, end = size - 1;
    int last = -1;
    // int mid = start/2 + end/2;
    int mid;
    while(start <= end){
        // mid = start + (end - start)/2;
        mid = (start + end)/2;
        if(key == arr[mid]){
            last = mid;
            start = mid + 1;
        }
        else if(key > arr[mid]){
            start = mid + 1;
        }
        else{
            end = mid - 1;
        }
        cout << "start: " << start << "\tend: " << end << "\tmid: " << mid << endl;
    }
    return last;
}

The values being passed to the function are:

int main(){
    vector<int> arr = {1,2,3,4,4,4,4,5,6,7,11};
    int size = arr.size();
    int key = 4;
    cout << "First occurrence of " << key << " is at index " << firstOccurrence(arr, size, key) << endl;

    cout << "Last occurrence of " << key << " is at index " << lastOccurrence(arr, size, key) << endl;

    return 0;
}

If the mid element equals the required "key" element then the mid index is stored in a variable and start is updated to mid + 1 so that it can search the right part of the array for any other occurrence of the "key". If the "key" is found to be less than the mid element it implies that the element is not present beyond the mid element and the end is updated to mid - 1 to search in the left part of the array, and similarly to search the right part if "key" is found to be greater than the mid element.

It gave different results when mid = start/2 + end/2 was used and mid = (start + end)/2 was used. How is this affected during the computation process?

4

There are 4 best solutions below

3
463035818_is_not_an_ai On BEST ANSWER

You need to consider that integer arithmetic cuts off any fractional parts, hence depending on the last bit of start and stop you get different results.

Suppose they are

 start = M*2 + a;
 end = N*2 + b;

Where M and N are integers and a and b are either 1 or 0, then you get

mid_0 = (start + end)/2 = M+N + (a+b) / 2
mid_1 = start/2 + end/2 = M+N
mid = start + (end - start)/2 = M*2 + a + (N-M) + (b-a)/2 = M+N + a + (b-a)/2 

Only the second expression does not depend on whether start or end are even or odd. I didn't actually bother to work out (via a 2x2 table) whether a + (b-a)/2 yields the same result as (a+b)/2. However, when dealing with integer arithmetic you better do not rely on intuition, because it's too easy to be off by one (or more). Moreover, I did not yet consider integer overflow. When start+end overflows then (start/2) + (end/2) does not.

0
Ofek Shilon On

As mentioned in the previous answer, (start + end)/2 handles residuals better. However, start/2 + end/2 is not susceptible to overflow while (start + end)/2 is.

So if you're handling arrays with potentially >2G elements, it is advised to either make start/end/mid 64b ints or prefer the start/2 + end/2 form.

0
Vlad from Moscow On

For starters the function is invalid. When size is equal to 1 then you have due to this statement

int start = 0, end = size - 1;

that end is equal to 0.

In this case the while loop

while(start < end){

will be skipped. And the function will return the value of last equal to -1

int last = -1;

// ...

return last;

though arr[0] can be equal to key.

As for your question then when start and end are both odd values then the value of mid will be one less for the expression

start/2 + end/2

then for other two expressions.

As for this expression (start + end)/2 then it is unsafe due to a possible overflow of the sum start + end.

Pay attention to that in C++20 there is function std::midpoint declared in header <numeric> that can be and should be used instead of manually written expressions.

As for the function as whole then there is already standard algorithm std::upper_bound declared in header <algorithm> that can be adapted for using instead of the function.

11
user1095108 On

All of these are attempts to prevent overflow when calculating the midpoint:

(a + b) / 2

The best way (supposedly) is this:

a + b = (a ^ b) + (a & b) << 1
(a + b) / 2 = (a ^ b) / 2 + (a & b)

The identity comes from Don Knuth's book, The Art of Computer Programming, Vol. 4.

This is because Don's formula is supposedly bulletproof. It should work equally well on negative and positive indices. Note that shifting right (including arithmetically) is not always the same as dividing by 2.