Binary search on array with duplicate

661 Views Asked by At

First time posting here, so apologies in advance if I am not following best practices. My algorithm is supposed to do the following in a sorted array with possible duplicates.

  1. Return -1 if the element does not exist in the array
  2. Return the smallest index where the element is present.

I have written a binary search algorithm for an array without duplicate. This returns a position of the element or -1. Based on blackbox testing, I know that the non-duplicate version of the binary search works. I have then recursively called that function via another function to search from 0 to position-1 to find the first incidence of the element, if any.

I am currently failing a black box test. I am getting a wrong answer error and not a time out error. I have tried most of the corner cases that I could think of and also ran a brute force test with the naive search algorithm and could not find an issue.

I am looking for some guidance on what might be wrong in the implementation rather than an alternate solution.

The format is as follow: Input:

5 #array size

3 4 7 7 8 #array elements need to be sorted

5 #search query array size

3 7 2 8 4 #query elements

Output

0 2 -1 4 1

My code is shown below:

class BinarySearch:
 
 def __init__(self,input_list,query):
        
    self.array=input_list
    self.length=len(input_list)
    self.query=query
    return
 
 def binary_search(self,low,high):
    '''
    Implementing the binary search algorithm with distinct numbers on a 
    sorted input.
    '''
    
    #trivial case
    if (self.query<self.array[low]) or (self.query>self.array[high-1]):
        
        return -1
        
    elif (low>=high-1) and self.array[low]!=self.query:
        
        return -1
    else:
        m=low+int(np.floor((high-low)/2))
        if self.array[low]==self.query:
            return low
        elif (self.array[m-1]>=self.query):
            return self.binary_search(low,m)
        elif self.array[high-1]==self.query:
            return high-1
        else:    
            return self.binary_search(m,high)
    return
        
class DuplicateBinarySearch(BinarySearch):
     
     def __init__(self,input_list,query):
       
        BinarySearch.__init__(self,input_list,query)
     
     def handle_duplicate(self,position):
        '''
        Function handles the duplicate number problem.
        Input: position where query is identified.
        Output: updated earlier position if it exists else return 
        original position.
        '''
        
        if position==-1:
            return -1
        elif position==0:
            return 0
        elif self.array[position-1]!=self.query:
            return position
        else:
            new_position=self.binary_search(0,position)
            if new_position==-1 or new_position>=position:
                return position
            else:
                return self.handle_duplicate(new_position)
     
     def naive_duplicate(self,position):
        
        old_position=position
        if position==-1:
            return -1
        else:
            while position>=0 and self.array[position]==self.query:
                position-=1
            if position==-1:
                return old_position
                
            else:
                return position+1
     


 if __name__ == '__main__':
    num_keys = int(input())
    input_keys = list(map(int, input().split()))
    assert len(input_keys) == num_keys
    

    num_queries = int(input())
    input_queries = list(map(int, input().split()))
    assert len(input_queries) == num_queries
   

    for q in input_queries:
        item=DuplicateBinarySearch(input_keys,q)
        #res=item.handle_duplicate(item.binary_search(0,item.length))
        #res=item.naive_duplicate(item.binary_search(0,item.length))
        #assert res_check==res
        print(item.handle_duplicate(item.binary_search(0,item.length)), end=' ')
        #print(item.naive_duplicate(item.binary_search(0,item.length)), end=' ')

When I run a naive duplicate algorithm, I get a time out error:

Failed case #56/57: time limit exceeded (Time used: 10.00/5.00, memory used: 42201088/536870912.)

When I run the binary search with duplicate algorithm, I get a wrong answer error on a different test case:

Failed case #24/57: Wrong answer

(Time used: 0.11/5.00, memory used: 42106880/536870912.)

The problem statement is as follows:

Problem Statement

Update:

I could make the code work by making the following change but I have not been able to create a test case to see why the code would fail in the first case.

Original binary search function that works with no duplicates but fails an unknown edge case when a handle_duplicate function calls it recursively. I changed the binary search function to the following:

 def binary_search(self,low,high):
        '''
        Implementing the binary search algorithm with distinct numbers on a sorted input.
        '''
        
        #trivial case
        if (low>=high-1) and self.array[low]!=self.query:
            return -1
        
        elif (self.query<self.array[low]) or (self.query>self.array[high-1]):
            return -1
            
        
        else:
            m=low+(high-low)//2
            if self.array[low]==self.query:
                return low
            elif (self.array[m-1]>=self.query):
                return self.binary_search(low,m)
            elif self.array[m]<=self.query:   
                return self.binary_search(m,high)
            elif self.array[high-1]==self.query:
                return high-1
            else: 
                return -1
1

There are 1 best solutions below

0
On

Since you are going to implement binary search with recursive, i would suggest you add a variable 'result' which act as returning value and hold intermediate index which equal to target value.

Here is an example:

def binarySearchRecursive(nums, left, right, target, result):

    """
    This is your exit point. 
    If the target is not found, result will be -1 since it won't change from initial value.
    If the target is found, result will be the index of the first occurrence of the target.
    """
    if left > right:
        return result 

    # Overflow prevention
    mid = left + (right - left) // 2

    if nums[mid] == target:
        # We are not sure if this is the first occurrence of the target.
        # So we will store the index to the result now, and keep checking.
        result = mid 
        # Since we are looking for "first occurrence", we discard right half.
        return binarySearchRecursive(nums, left, mid - 1, target, result) 
    elif target < nums[mid]:
        return binarySearchRecursive(nums, left, mid - 1, target, result)
    else:
        return binarySearchRecursive(nums, mid + 1, right, target, result)

if __name__ == '__main__':

    nums = [2,4,4,4,7,7,9]
    target = 4 

    (left, right) = (0, len(nums)-1)
    result = -1 # Initial value
    index = binarySearchRecursive(nums, left, right, target, result)

    if index != -1:
        print(index)
    else:
        print('Not found') 

From your updated version, I still feel the exit point of your function is a little unintuitive.(Your "trivial case" section)

Since the only condition that your searching should stop, is that you have searched all possible section of the list. That is when the range of searching area is 0, there is no element left to be search and check. In implementation, that is when left < right, or high < low, is true.

The 'result' variable, is initialized as -1 when the function first been called from main. And won't change if there is no match find. And after each successful matching, since we can not be sure if it is the first occurrence, we will just store this index into the result. If there are more 'left matching', then the value will be update. If there is not, then the value will be eventually returned. If the target is not in the list, the return will be -1, as its original initialized value.