Longest Substring with no Y Divide and Conquer

512 Views Asked by At

Before I carry on to the problem, I should note that I know there are much easier ways to solve this problem without using divide and conquer; however, the point of solving this problem under this restriction is that I actually want to learn how to tackle problems with divide and conquer. I am good at recognizing correct solutions, but implementing my own D&C strategy is not a skill I currently have.

The problem is this: given a string, find the longest substring that does not contain the letter 'y'. For example, longestNoY("abydefyhi") should return "def".

My first approach to tackle this problem was to determine the base cases. If we had a string of length 2, we would want to return the non-y components (or empty string if both characters were 'y'). If we had a string of length 1, we would return it if it is not a 'y'.

So the first part should look like this:

def longestNoY(string, start, end):
     #Conquer
     if start == end:
          if string == 'y': return ''
          return string
     if start + 1 == end:
          if string == "yy": return ''
          if string[0] == 'y': return string[1]
          return string[0]
     ....

Next, I knew that I would need to recursively call the function for each half of the parent string. I also knew that I wanted the function to return the longer of the two children, except if the sum of the lengths of the two children was equal to the length of the parent, then the function should return the parent because there were no 'y's in the children.

    #Divide and Partial Implementation of Rejoin
    ....
    middle = (start + end) // 2
    leftString = longestNoY(string, start, middle)
    rightString = longestNoY(string, middle, end)
    if len(leftString) + len(rightString) == len(string): return string
    ....

The part I am having trouble with now would best be explained with an example:

0 1 2     3 4   5 6   7 8 
a b   y   d e | f y   h i
a b   y | d e | f y | h i
a b | y | d e | f y | h i 

The longest substring in the left side is either "ab" or "de", but we know that "de" is adjacent to an 'f' which would make "def" the longest. I don't know exactly how to carry on with this problem. Please do not give me a program to solve this problem.

4

There are 4 best solutions below

0
On BEST ANSWER

I think the best way to put the problem is to find the positions just before and just after y, not being y. This way you will find left and right ends of intervals. I do not give you the code, since you specifically asked as not to solve the problem for you, just point to the right direction, so:

  • In trivial cases (length of interval is 0) determine whether the item you have is a valid left end or right and of an interval
  • In non-trivial cases always halve the set to left and right (no problem if the number of items is odd, just put the middle somewhere) and issue the divide and conquer for them as well
  • In non-trivial cases always consider the best interval the left and right sub-problem gives you
  • In non-trivial cases make sure that if an interval happens to start in the left and end in the right, you take that into account
  • from such intervals, the one which has a greater length is better

These are the ideas you need to employ in order to implement the divide and conquer you desire. Happy coding!

2
On

It is possible. But then you each time need to return four values: the longest subsequence that starts at the left end of the "slice" (this can be zero), the longest subsequence "in the middle", the longest subsequence that ends at the right end of the "slice" (this can be zero as well), and if the string is just a sequence of non-Y characters (a boolean). The fourth element can in fact just be derived by checking if one of the elements in the first three is equal to the length, but this is probably easier to implement.

Why is this important? Because a sequence of non-ys can pass "through" a divide. For example:

abcdeYfghi   jklYmnopqr

here if we split it in the middle (or any other way that is not "constant" and "rest").

So here recursively we have several cases:

  1. the empty string returns (0, 0, 0, True),
  2. the non-empty string other than Y, we return (1, 1, 1, True);
  3. for the singleton string Y we return (0, 0, 0, False);
  4. the recursive case that divides the string in two, and applies "merge" logic afterwards on the results.

The "merge logic" is rather complex, especially since it is possible that both "subslices" only contain non-Y strings. After slicing we thus obtain two triples (a0, a1, a2, a3) and (b0, b1, b2, b3), and we produce a 3-tuple (c0, c1, c2, c3).

If a3 = True and b3 = True, then of course that means that the current slice contains no Y's as well. So we can derive that:

c3 = a3 and b3

given a3 holds, then it holds that c0 = a0 + b0 since then a0 has no Y's and thus the left "sequence" is the same as the entire length of the subsequence plus the left subsequence of the right part. If a3 does not hold, c0 is just a0.

Given b3 holds, then it holds that c2 = a2 + b2 for the same reasoning as the one above, if not, then a2 = b2.

Now the element in the middle is the maximum of three elements:

  1. the element in the middle of the left slice a1;
  2. the element in the middle of the right slice b1; and
  3. the sum of a2 and b0 since there can be overlap and then this is the sum of the two.

We thus return the maximum of the tree.

So in Python, this looks like:

def longestNoY(string, start, end):
    if start == end:
        return (0, 0, 0, True)
    elif start+1 == end:
        if string[start] == 'y':
            return (0, 0, 0, False)
        else:
            return (1, 1, 1, True)
    else:
        mid = (start + end)//2
        a0, a1, a2, a3 = longestNoY(string, start, mid)
        b0, b1, b2, b3 = longestNoY(string, mid, end)
        c3 = a3 and b3
        c0 = a0 + a3 * b0
        c2 = b2 + b3 * a2
        c1 = max(a1, b1, a2 + b0)
        return (c0, c1, c2, c3)

The final result is the maximum of the first three items in the tuple.

For the given sample string, we thus obtain:

             (1, 1, 1, True) a
             (1, 1, 1, True) b
         (2, 2, 2, True) ab
             (0, 0, 0, False) y
             (1, 1, 1, True) d
         (0, 1, 1, False) yd
     (2, 2, 1, False) abyd
             (1, 1, 1, True) e
             (1, 1, 1, True) f
         (2, 2, 2, True) ef
             (0, 0, 0, False) y
                 (1, 1, 1, True) h
                 (1, 1, 1, True) i
             (2, 2, 2, True) hi
         (0, 2, 2, False) yhi
     (2, 2, 2, False) efyhi
 (2, 3, 2, False) abydefyhi
(2, 3, 2, False)

but that being said, it looks to me as an unnecessary complicated procedure to construct something that, in terms of time complexity, is the same as traversal, but typically more expensive (function calls, constructing new objects, etc.). Especially since linear traversal is just:

def longestNoY(string):
    mx = 0
    cur = 0
    for c in string:
        if c == 'y':
            mx = max(mx, cur)
            cur = 0
        else:
            cur += 1
    return mx

There is however an advantage here is that the above described algorithm can be used for parallelization. If for example the string is huge, the above can be used such that every core can count this. In that case it is however likely beneficial to use an iterative level on the "core" level, and only use the above to "distribute" work and "collect" results.

5
On

This can be easily solved by just traversing through the string. But I know you want to learn Divide Conquer.

To me, this is not a good problem to solve using Divide Conquer.

What @WillemVanOnsem suggested by recursion has essentially the same effect as when you traverse linearly.

But if you do want to do it in Divide & Conquer fashion, you need to consider the substring that crosses the mid point i.e. start <= i <= mid < j <= end - but that would be overkill.

0
On

Now that I actually had time to study, I decided to come back to this problem and came up with a very readable solution. Here it is:

def across(string, start, end, middle):
    startL = middle
    bestL = ''
    while(startL >= start and string[startL] != 'y'):
        bestL = string[startL] + bestL
        startL -= 1
    startR = middle + 1
    bestR = ''
    while(startR <= end and string[startR] != 'y'):
        bestR = bestR + string[startR]
        startR += 1
    return bestL + bestR

def longestNoY(string, start, end):
    if(start > end):
        return ''
    if(start == end):
        if(string[start] == 'y'):
            return ''
        return string[start]
    middle = (start + end) // 2
    leftString = longestNoY(string, start, middle)
    rightString = longestNoY(string, middle + 1, end)
    acrossString = across(string, start, end, middle)
    return max(leftString, rightString, acrossString, key=len)