We are writing c# program that will help us to remove some of unnecessary data repeaters and already found some repeaters to remove with help of this Finding overlapping data in arrays. Now we are going to check maybe we can to cancel some repeaters by other term. The question is:
We have arrays of numbers
{1, 2, 3, 4, 5, 6, 7, ...}, {4, 5, 10, 100}, {100, 1, 20, 50}
some numbers can be repeated in other arrays, some numbers can be unique and to belong only to specific array. We want to remove some arrays when we are ready to lose up to N numbers from the arrays.
Explanation:
{1, 2}
{2, 3, 4, 5}
{2, 7}
We are ready to lose up to 3 numbers from these arrays it means that we can remove array 1 cause we will lose only number "1" it's only unique number. Also we can remove array 1 and 3 cause we will lose numbers "1", "7" or array 3 cause we will lose number "7" only and it less than 3 numbers.
In our output we want to give maximum amount of arrays that can be removed with condition that we going to lose less then N where N is number of items we are ready to lose.
You need to first get a number for each array which tells you hwo many numbers are unique to that particular array.
An easy way to do this is
O(n²)
since for each element, you need to check through all arrays if it's unique.You can do this much more efficiently by having sorted arrays, sorting first or using a heap-like data structure.
After that, you only have to find a sum so that the numbers for a certain amount of arrays sum up to
N
.That's similar to the subset sum problem, but much less complex because
N > 0
and all your numbers are> 0
.So you simply have to sort these numbers from smallest to greatest and then iterate over the sorted array and take the numbers as long as the sum
< N
.Finally, you can remove every array that corresponds to a number which you were able to fit into
N
.