Heap sort running time

1.3k Views Asked by At

I posted a similar question last time when I compared the running times of two different implementations of insertion sort. I have a similar problem now. I know that a heap sort's complexity is O(nlogn), same as the quick sort in the average case. But here are my results for when I sort an array of randomly generated integers of size 10,000.

Quick sort: Time taken for execution: 0.005288

Heap sort: Time taken for execution: 0.234245

As you can see, my heap sort is taking much more time than it should.

Here is my code for the max_heapify, build_max_heap and heapsort functions:

void max_heapify(int *a,int n,int i)
{
    int largest = 0;

    if((2*i)<=n && a[2*i] > a[i]) largest = 2*i;
    else largest = i;

    if((2*i+1)<=n && a[2*i+1] > a[largest]) largest = 2*i+1;

    if(largest != i) 
    {
        swap(a[i],a[largest]);
        max_heapify(a,n,largest);
    }
}

void build_max_heap(int *a,int n)
{
    for(int i=1;i<=n/2;i++)
    {
        max_heapify(a,n,i);
    }
}

void heapsort(int *a,int n)
{   

    while(n>0)
    {
        build_max_heap(a,n);
        swap(a[1],a[n--]);
    }
}

Can anybody tell me the flaw in the implementation? The code works. I tested it personally for smaller sample sizes, but it's not efficient.

Any help is appreciated. Thanks!

3

There are 3 best solutions below

0
On
  1. You are trying to make a determination based on a single random sample (regardless of how many elements are in the sample), which is unreliable in the extreme.

  2. The difference may be due to code outside of the algorithms (possibly the first algorithm run took time to initialize things that did not need to be initialized again for the second). This can also be addressed by running multiple samples (say, by alternating between algorithms).

0
On

FWIW - I found sorting 30000 element produced several elements out of order.

void check(const int *a, int n)
{     
      int i=1;
      for(i=1; i<n; i++)
      {
         if(a[i]< a[i-1])
         {
            printf("out of order a[%d]=%d a[%d]=%d\n", i,a[i], i-1, a[i-1]);

         }
      }       
}

Produced this:

out of order a[1]=0 a[0]=1171034357
out of order a[29989]=2147184723 a[29988]=2147452680
out of order a[29991]=2146884982 a[29990]=2147187654
out of order a[29993]=2145044438 a[29992]=2147179272
out of order a[29995]=2145040183 a[29994]=2146044388
out of order a[29996]=2130746386 a[29995]=2145040183
out of order a[29997]=2062733752 a[29996]=2130746386
out of order a[29998]=2026139713 a[29997]=2062733752
out of order a[29999]=1957918169 a[29998]=2026139713

Profiling:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 42.57      1.06     1.06                             _mcount_private
 28.11      1.76     0.70 225000000     0.00     0.00  max_heapify
 19.68      2.25     0.49                             __fentry__
  9.64      2.49     0.24    30000     0.01     0.03  build_max_heap
  0.00      2.49     0.00   427498     0.00     0.00  swap
  0.00      2.49     0.00        1     0.00     0.00  check
  0.00      2.49     0.00        1     0.00   940.00  heapsort

Ignore _mcount_private it is there from profiling.

Consider: 1. Be sure your code works correctly. 2. Then worry about speed of execution. 3 @ScottHunter (+1) is correct - your analysis is flawed and so in the same sense is the above profile.

Second profile on different set of 30000 random numbers:

$ ./heapsort
out of order a[1]=0 a[0]=1451707585
out of order a[29989]=2147418839 a[29988]=2147447414
out of order a[29990]=2147336854 a[29989]=2147418839
out of order a[29991]=2147314086 a[29990]=2147336854
out of order a[29992]=2147202205 a[29991]=2147314086
out of order a[29993]=2147177981 a[29992]=2147202205
out of order a[29994]=2133236301 a[29993]=2147177981
out of order a[29996]=2100055600 a[29995]=2137349904
out of order a[29998]=2044161181 a[29997]=2108306471
out of order a[29999]=1580699248 a[29998]=2044161181

Owner@Owner-PC ~
$ gprof heapsort | more
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 45.20      1.13     1.13                             _mcount_private
 30.00      1.88     0.75 225000000     0.00     0.00  max_heapify
 13.20      2.21     0.33                             __fentry__
 11.60      2.50     0.29    30000     0.00     0.00  build_max_heap
  0.00      2.50     0.00   427775     0.00     0.00  swap
  0.00      2.50     0.00        1     0.00     0.00  check
  0.00      2.50     0.00        1     0.00     1.04  heapsort
1
On

I guess repetitive build_max_heap() is the flaw. Heapsort build a heap tree once, then delete a largest element in the root node repeatedly.
GIF Animation in wikipedia is a good sample to understand.
I have evaluated 3 algorithms with random strings. N=100,000

qsort(3)      usec = 54011  call = 0        compare = 1536365  copy = 0
qsort_trad()  usec = 67603  call = 99999    compare = 2368481  copy = 1344918
heap_sort()   usec = 88814  call = 1624546  compare = 3019351  copy = 1963682

qsort(3) is merge sort of index sorting in GNU C library. qsort_trad() is conventional median-of-3 quicksort. heap_sort() is my implementation.
Synopsis is sames to qsort(3).