Finding k elements of length-n list that sum to less than t in O(nlogk) time

420 Views Asked by At

This is from Programming Pearls ed. 2, Column 2, Problem 8:

Given a set of n real numbers, a real number t, and an integer k, how quickly can you determine whether there exists a k-element subset of the set that sums to at most t?

One easy solution is to sort and sum the first k elements, which is our best hope to find such a sum. However, in the solutions section Bentley alludes to a solution that takes nlog(k) time, though he gives no hints for how to find it. I've been struggling with this; one thought I had was to go through the list and add all the elements less than t/k (in O(n) time); say there are m1 < k such elements, and they sum to s1 < t. Then we are left needing k - m1 elements, so we can scan through the list again in O(n) time looking for all elements less than (t - s1)/(k - m1). Add in again, to get s2 and m2, then again if m2 < k, look for all elements less than (t - s2)/(k - m2). So:

def kSubsetSumUnderT(inList, k, t):
    outList = []
    s = 0
    m = 0
    while len(outList) < k:
        toJoin = [i for i in inList where i < (t - s)/(k - m)]
        if len(toJoin):
            if len(toJoin) >= k - m:
                toJoin.sort()
                if(s + sum(toJoin[0:(k - m - 1)]) < t:
                    return True
                return False
            outList = outList + toJoin
            s += sum(toJoin)
            m += len(toJoin)
        else:
            return False

My intuition is that this might be the O(nlog(k)) algorithm, but I am having a hard time proving it to myself. Thoughts?

2

There are 2 best solutions below

0
On

Probably the natural algorithm with running time Theta(n log k) is to initialize a max-heap with k infinities, then iterate through the array, pushing the new element and popping the max to leave the k least in the heap at the end. (As Bentley mentions, selection is asymptotically faster, at Theta(n) time. The best way to select in practice may be to heapify and pop the min k times, which is Theta(n + k log n) = Theta(n) when k = O(n/log n).)

1
On

Consider the example where t>0 and all([x>t for x in inList]). toJoin will always be empty, and your algorithm doesn't even finish, let alone in O(nlog(k)).

The hint you are probably missing is http://en.wikipedia.org/wiki/Heap_(data_structure)