canceling arrays by number of items that I am ready to lose

148 Views Asked by At

We are writing c# program that will help us to remove some of unnecessary data repeaters and already found some repeaters to remove with help of this Finding overlapping data in arrays. Now we are going to check maybe we can to cancel some repeaters by other term. The question is:

We have arrays of numbers

{1, 2, 3, 4, 5, 6, 7, ...}, {4, 5, 10, 100}, {100, 1, 20, 50}

some numbers can be repeated in other arrays, some numbers can be unique and to belong only to specific array. We want to remove some arrays when we are ready to lose up to N numbers from the arrays.

Explanation:

  1. {1, 2}

  2. {2, 3, 4, 5}

  3. {2, 7}

We are ready to lose up to 3 numbers from these arrays it means that we can remove array 1 cause we will lose only number "1" it's only unique number. Also we can remove array 1 and 3 cause we will lose numbers "1", "7" or array 3 cause we will lose number "7" only and it less than 3 numbers.

In our output we want to give maximum amount of arrays that can be removed with condition that we going to lose less then N where N is number of items we are ready to lose.

2

There are 2 best solutions below

3
On

You need to first get a number for each array which tells you hwo many numbers are unique to that particular array.
An easy way to do this is O(n²) since for each element, you need to check through all arrays if it's unique.
You can do this much more efficiently by having sorted arrays, sorting first or using a heap-like data structure.

After that, you only have to find a sum so that the numbers for a certain amount of arrays sum up to N.
That's similar to the subset sum problem, but much less complex because N > 0 and all your numbers are > 0.
So you simply have to sort these numbers from smallest to greatest and then iterate over the sorted array and take the numbers as long as the sum < N.
Finally, you can remove every array that corresponds to a number which you were able to fit into N.

0
On

This problem is equivalent to the Set Cover problem (e.g.: take N=0) and thus efficient, exact solutions that work in general are unlikely. However, in practice, heuristics and approximations are often good enough. Given the similarity of your problem with Set Cover, the greedy heuristic is a natural starting point. Instead of stopping when you've covered all elements, stop when you've covered all but N elements.