Google interview question - check if all subarrays of an array have at least one unique element

154 Views Asked by At

I came across a problem that I learned was a Google interview question. The problem is:

An array is good if every subarray contains at least one element with frequency 1.

Design an algorithm to verify if an array is good in O(nlg^2(n)) time complexity. I assume the algorithm should be deterministic, not randomized.

I thought of using a divide-and-conquer approach: We split the array into two halves and recursively check if they are good. The base case is when we have an array of size 1, which is always a good subarray. The challenge is how to combine the results of the two subarrays and determine if their concatenation is also good in time o(n^2). This is because we have O(n^2) subarrays to examine in this step. Can we do the combine step in O(nlgn)? If yes, we get the recurrence: T(n) = 2T(n/2)+O(nlgn) Then, by applying the Master's Theorem, we can conclude that T(n)=O(nlg^2n).

3

There are 3 best solutions below

4
trincot On

Divide and conquer seems a good idea, but don't divide in halves. Instead look for a unique element in the array, and use it as division point, while excluding that unique element from both partitions. If multiple unique elements are found, then divide into more partitions (to save some recursion depth), using all unique elements as division points.

2
Matt Timmermans On

This is a tricky one, but you can do it in O(n log n) time.

Given an array A to test, let's first consider whether or not all the subarrays ending at position i, i.e., the suffixes of A[0...i] all contain a unique element.

If the last occurrence of any element in A[0...i] occurs at position y, then A[y] is obviously a unique element in A[y...i]. It is also a unique element in every longer suffix up to any previous occurrence of that element.

For every element that occurs in A[0...i], then, there is an interval of starting positions [x,y], where A[y] is the last occurrence of that element, and A[x-1] is the preceding occurrence of that element, or x=0. A[y] is a unique element in every subarray A[u...i] where x <= u <= y.

A[0...n-1] is a good subarray if and only if, for every i < n, all of the indexes 0 through i are covered by these intervals for elements in A[0...i].

Reduction to interval coverage

As we increase i from 0 to n-1, we can use a hash map to keep track of the last 2 occurrences of each element, and the set of relevant intervals. For each new A[i], only two intervals change -- the previous interval for that element is removed, and the new one is added. Keeping track of these intervals takes linear time all together.

In order to meet the complexity target of O(n log n), for every i, we need to be able to update 2 intervals and check to see of all 0...i are covered by intervals in O(log n) time. If we always extend the last interval all the way out to A[n-1] then we can simplify this check -- for every i, we need to see if all indexes 0 through n-1 are covered by intervals.

Reducing interval coverage to minimum prefix sum

Instead of keeping track of the number of intervals covering each element, and looking for zeros, we can track just the differences in this number between each element and the previous one. Let's call this array D. The value at every D[j] is just the number of intervals that start at position j, minus the number of intervals that end at j-1.

The number of intervals covering an index j is then a prefix sum sum(D[0...j]). If A is a good array, then at every position i, all of the prefix sums of D will be > 0.

Minimum prefix sum in O(log n) time

Instead of maintaining D in a simple array, we could use a Fenwick tree. This allows us to do updates and prefix sum queries in O(log n) time. To find the minimum prefix sum, however, we need to augment each node in the Fenwick tree with the minimum prefix sum of elements in its range. Since a node in a Fenwick tree can have O(log n) children, however, it can take O(log n) time to recalculate the minimum prefix sum for a node when one of its children changes. Since a Fenwick tree update can touch O(log n) nodes, this increases our update time to O(log2 n). That meets your original requirement of O(n log2 n) time, but not what I promised.

To do the interval updates and minimum-prefix-sum checks in O(log n) time, you can use an alternative structure described here: https://stackoverflow.com/a/77217105/5483526 , an "Implicit in-order binary tree". Like a Fenwick tree, it assigns a range of elements in D to each node, and you still need to augment each node with the minimum prefix sum in its range, in addition to the total sum. Nodes in this tree have only 2 children, however, so you can do each node update in constant time.

0
גלעד ברקן On

For an array to be "good," the leftmost position for which all subarrays are covered must be 0 for every element we traverse that we consider rightmost in the subarray. Each element is associated with an interval for which it is unique. We can store this for each element in O(n).

0, 1, 2, 3, 4, 5, 6
a, b, e, a, d, e, d
^-----> a
<--^--------------> b
<-----^-----> e
   <-----^--------> a
<-----------^--> d
         <-----^--> e
               <--^ d

Now traverse and add and remove intervals to a segment tree as we go. Since we want to use the intervals as we look left from any element, we can only use the part of the interval extending left. For example, to use b (position 1) to validate a subarray from d (position 6), we can only use it for subarrays extending all the way left at least to position 1, since shorter subarrays from d will not contain b:

0, 1, 2, 3, 4, 5, 6
a, b, e, a, d, e, d
<--^--------------> b
---- (allowed)

Let's traverse:

Position 0, we add the allowed section, A [0,0], incrementing its range in the segment tree:

0, 1, 2, 3, 4, 5, 6
a, b, e, a, d, e, d
^----->
- A [0,0]

For each element we traverse, we query the tree for the minimum in the range [0, element position]. If the minimum is 0, meaning some interval is not covered, we know the array is not "good." We remove an element's allowed section from the tree when we reach the rightmost position of its full interval.

Position 1, we add B, incrementing its range in the segment tree:

0, 1, 2, 3, 4, 5, 6
a, b, e, a, d, e, d
<--^-------------->
---- B [0,1]

Position 2, still valid, we remove A [0,0] since we've reached the rightmost point we can use it to validate subarrays going forward (the next a's interval will be inserted next):

0, 1, 2, 3, 4, 5, 6
a, b, e, a, d, e, d
^----->
- A [0,0]

...

Once we reach position 6, we have the following set of intervals covering the range. Each one means we could use its associated element as unique in the subarrays extending from d (position 6) left to that range:

0, 1, 2, 3, 4, 5, 6
a, b, e, a, d, e, d
--- b
   ------- a
         ------- e
               ---- d