Why the time complexity for shell sort is nlogn in my data?

1.1k Views Asked by At

environment: python3.6, Anaconda 5.1, Jupyter notebook, numba.

I used a random array generated by Python to measure the time complexity of shell sort, but found that its time complexity is more in line with NlogN. I understand that the time complexity of shell sort is O(n^2), I am confused.

Shell sort code:

def shell_sort(list):
    n = len(list)
    gap = n // 2
    while gap > 0:
        for i in range(gap, n):
            temp = list[i]
            j = i
            while j >= gap and list[j - gap] > temp:
                list[j] = list[j - gap]
                j -= gap
            list[j] = temp
        gap = gap // 2
    return list

shell sort time complexity analysis

3

There are 3 best solutions below

2
On

O(n^2) is only the worst case time complexity, so the algorithm can run in less time than that on a random input and even on average (or even on almost all its inputs...).

Also the complexity of Shellsort depends on the "gap sequence" you selected.

Certain gap sequences result in a worst time case smaller than O(n^2), like O(n^1.5) for the gap sequence 1, 4, 13, 40, 121, ... or even O(nlog^2(n)) for 1, 2, 3, 4, 6, 8, 9, 12, ... (both due to Pratt, 1971). In other words: just trying on one input is not significant at all and the claim about O(n^2) may be false depending on the exact implementation of the algorithm.

2
On

There exists a lot of problems relative to Shell sort complexity and it is suspected that with some appropriate choice of parameters and for some inputs, its complexity could be O(n.logn).

0
On

I have studied shellsort for smaller n and I can state unequivocally that the gap sequence that produces the best average (number of comparisons) for n = 10, which my software tested on n! different orderings (the entire set) and on 2^(n-2) gap sequences (all possible sequences for n = 10) is {9,6,1}.

The average is clearly O(n * log(n)) as is the worst case.

The best case is similar to insertion sort's n-1 comparisons - O(n) complexity - best case (already ordered data) not only because it can be similarly calculated:

(n-9)+(n-6)+(n-1) = (n * # of gaps)-sum(gaps) = 14.

I know that most would say this is O(n * log(n)) complexity. Personally, I think this is unwarranted because it does not need to be estimated: for any n worked on by any gap sequence the best case is easily and exactly determined.

I'll be stubborn and drive this home: let's just assume, for the sake of argument, that the best case for any shellsort is (n * # of gaps). For n = 10 & # of gaps = 3 the best case would be 10 * 3 or 30. Would this be O(n) complexity? I can see why it could be. Yet shellsort's best case is significantly less than (n * # of gaps) so why is it O(n * log(n))?

It is possible (though extreeeeemely unlikely) that the OP managed to pinpoint the best (or close to it) gap sequence for his n, resulting in O(n * log(n)) complexity. But determining that requires more analysis than guts can provide.