C how to use quicksort with comparator function

638 Views Asked by At

I am trying to understand what is going on in the first few lines of this comparator function stringAsInt(const void *pLeft, const void *pRight)

  1. So the parameters are constant pointers for something. Then in the next two lines we are casting the void to (const char**) Why is it being cast to a pointer of a pointer? Also, what exactly is going on in those two lines?
  2. When calling qsort() in the main() function, why are there no parameters being passed to stringAsInt()? How does stringAsInt() know what pLeft and pRight are?
  3. Why is a being setup as a pointer of a pointer? Wouldn't a standard array suffice?

-

int stringAsInt(const void *pLeft, const void *pRight) {
    const char *left = *(const char**)pLeft;
    const char *right = *(const char**)pRight;
    int leftLen = (int)strlen(left);
    int rightLen = (int)strlen(right);
    if (leftLen != rightLen) {
        return leftLen - rightLen;
    } else {
        return strcmp(left, right);
    }
}

int main() {
    int n;
    scanf("%d", &n);
    char buffer[1000000 + 1];
    char **a = malloc(sizeof(char*) * (size_t)n);
    for (int i = 0; i < n; i++) {
        scanf("%1000000s", buffer);
        a[i] = malloc(sizeof(char) * (strlen(buffer) + 1));
        strcpy(a[i], buffer);
    }
    qsort(a, (size_t)n, sizeof(a[0]), stringAsInt);
    for (int i = 0; i < n; i++) {
        printf("%s\n", a[i]);
        free(a[i]);
    }
    free(a);
    return 0;
} 
3

There are 3 best solutions below

0
On

So the parameters are constant pointers for something. Then in the next two lines we are casting the void to (const char**) Why is it being cast to a pointer of a pointer? Also, what exactly is going on in those two lines?

It's type erasure in C. qsort does not know what type do array elements have, it only knows their sizes and passes individual elements to the comparator as type-erased pointers to void, expecting comparator to cast the pointers to the needed type. That is exactly what happens in these two lines. Array of strings is simply organized as array of pointers to char (a traditional C idiom for character string). Each element of array is a pointer to character (actually, to the first one in a contigous sequence of characters). qsort passes type-erased pointers to those, comparator downcasts them back to the concrete type.

When calling qsort() in the main() function, why are there no parameters being passed to stringAsInt()? How does stringAsInt() know what pLeft and pRight are?

It is a pointer to function. qsort's last argument is a function pointer. It then calls this function to compare individual pairs of elements, each time providing values for pLeft and pRight.

Why is a being setup as a pointer of a pointer? Wouldn't a standard array suffice?

Probably, if you know for sure that your file is not large. If it consists of milliards of strings, the program would be likely to crash due to the 2nd level domain of this site's URI, so they decided to put the whole array on heap.

0
On

Code breakdown in the comments.

//
//  main.c
//

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

//Takes a pointer to any type.. (void *)
int stringAsInt(const void *pLeft, const void *pRight) {
    //qsort calls this function with a pointer to each element. Since each element is a char* then pLeft is actually a char**.
    //For example, if sorting an array of ints (each element is of type: int), pLeft would be int*.
    //If sorting an array of strings (each element is of type: char*), pLeft would be char**.
    //So on and so forth.


    //Thus we cast the pLeft to its correct type (const char**) for pointer to string.. Then we dereference it to get the string itself (const char*).
    const char *left = *(const char**)pLeft;
    const char *right = *(const char**)pRight;

    int leftLen = (int)strlen(left);
    int rightLen = (int)strlen(right);

    //Compare the lengths of the strings.. If left is < right, we return negative. If they are the same, 0.. else positive..
    if (leftLen != rightLen) {
        return leftLen - rightLen;
    } else {
        return strcmp(left, right);  //Lengths are equal.. compare their contents..
    }
}

int main() {
    int n;

    //First the code takes an array count.. This is the amount of arrays to sort..
    printf("Enter number of strings: ");
    scanf("%d", &n);

    //It allocates a large buffer on the stack to hold entire sentences to sort..
    char buffer[1000000 + 1];

    //Allocates an array of strings..
    char **a = malloc(sizeof(char*) * (size_t)n);

    for (int i = 0; i < n; i++) {
        printf("Enter a string: ");
        scanf("%1000000s", buffer);

        //Store each string in the array..
        a[i] = malloc(sizeof(char) * (strlen(buffer) + 1));
        strcpy(a[i], buffer);
    }

    //Sort the array using stringAsInt comparator..
    //a is the array to sort.
    //n is the amount of elements in the array.
    //sizeof(a[0]) is the size of each element in the array. sizeof(char*).
    //stringAsInt is the comparator function (pointer to function)..

    qsort(a, (size_t)n, sizeof(a[0]), stringAsInt);

    //Print the sorted array and cleanup each element.
    for (int i = 0; i < n; i++) {
        printf("%s\n", a[i]);
        free(a[i]);
    }

    //Cleanup the array itself.
    free(a);
    return 0;
}
0
On
  1. The casting dance in stringAsInt is because qsort() calls the comparator function with const void * parameters. The comparator, however, needs to reference the actual structures they point to in order to do the comparison, so it casts to const pointers of the correct type.

In this case, the correct type is a pointer to a pointer to a char, because a string is a char *, and the address of that string is a char **. The qsort is going to sort an array of strings.

C is confusing that way. Consider the following memory addresses holding the following characters:

0x0100 'f'
0x0101 'o'
0x0102 'o'
0x0103 '\0'

0x3127 'b'
0x3128 'a'
0x3129 'r'
0x312a '\0'

We have an array of pointers to these strings (variable a in the above program):

0x2522: 0x0100
0x2524: 0x3127

so a == 0x2522, a[0] = 0x0100 ("foo"), a[1] = 0x3127 ("bar")

The qsort() is going to switch the contents of a[0] and a[1] (because "bar" is lexicographically before "foo"); the only change after its called will be:

0x2522: 0x3127
0x2524: 0x0100

The strings themselves will not be changed.

  1. The qsort routine knows where its first element is, how many elements it must sort, and how big each element is--everything it needs to know the addresses of each element (if the array starts at 0x1000, has 10 elements, and each element is 16 bytes long, it knows that element 0 is at 0x1000, element 1 at 0x1010, etc). The qsort routine will call stringAsInt() as many times as it needs, with the elements it has calculated, and will swap the elements that must be swapped to sort the array.

  2. Because of the strings. If it were sorting ints (for example) it would just be a standard array. But strings are char *'s.