I want to check if the list values have some level of "closeness". Is there a good algorithm to do this? Bonus points for the most pythonic way.
Valid
[1,7,8,9]
[3,4,100,101,102,103,104,105]
Not Valid
[1,8,9]
[1,10]
[100,200,300,400,500]
I want to check if the list values have some level of "closeness". Is there a good algorithm to do this? Bonus points for the most pythonic way.
Valid
[1,7,8,9]
[3,4,100,101,102,103,104,105]
Not Valid
[1,8,9]
[1,10]
[100,200,300,400,500]
Here is a simple linear-time algorithm for an array a
that is already sorted (as in the examples, otherwise it needs to be sorted beforehand in O(n log n)
time). The idea is to construct and test each maximal subsequence that starts at a given position low
.
low = middle = high = 1
while (low <= length (a))
advance middle to the largest i such that a[i]*0.8<=a[low]
advance high to the largest i such that a[i]<=a[middle]*1.2
if ((high-low+1)/length(a)>=0.7) output(true)
low = low + 1
return (false);
Since low
, middle
, and high
are always increased from 1
through length(a)
, the running time is always linear in length(a)
.
If the matching subsequence of a
is desired, one can output a[low]...a[high]
instead of true
.
Here is an algorithm that takes n logn
time.
sort the array
for i in range(len(array))
begin = binary search an index such that array[begin] >= array[i]*0.2
end = binary search an index such that array[end]*0.2 <= array[i]
if (end - begin) <= len(array) * 0.7
70% of the values are within %20 of array[i]
i.e all elements between begin and end are within 20% of array[i]
Several optimizations, including changing the iteration order, are possible.
For small lists this O(n^2) algorithm will suffice:
Output is: