Can I sort an array of pointers in order to remove duplicates?

152 Views Asked by At

Let's say I defined a structure and created a few instances of it. I have an array of pointers to these instances but some pointers point to the same instances of my structure. I want to remove duplicates in the array so I sort it by qsort() function. Here's my comparison function:

int cmp(const void *a, const void *b)
{
  struct foo **s1 = (struct foo**)a;
  struct foo **s2 = (struct foo**)b;
  return *s1 - *s2;
}

The question is, can I really sort using such a comparison? I know that subtructing pointers is undefined in this case but I care only about the integer value of pointers, because I simply want pointers pointing to the same instance of foo to be next to each other.

Maybe I should use function like this:

int cmp(const void *a, const void *b)
{
  struct foo **s1 = (struct foo**)a;
  struct foo **s2 = (struct foo**)b;
  if (*s1 > *s2)
    return 1;
  else if (*s1 == *s2)
    return 0;
  else 
    return -1;
}

Is there any difference in using them?

2

There are 2 best solutions below

13
On BEST ANSWER

If you have intptr_t (which is an integer type), then you can cast your void* pointers to that, and then you can compare arbitrary values. intptr_t is not a required type, but it is pretty common in practice.

Subtracting one intptr_t from another might still overflow, so it is not a strictly portable solution. But comparison is fine. If you used uintptr_t to avoid overflow, the difference will never be negative. That's a general problem with using a - b to implement qsort-style compare functions.

Subtracting or comparing pointers which don't point into the same object is undefined behaviour. So neither of the proposed solutions in the question are valid:

§6.5.6 para 9 (Additive operators):

When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object

§6.5.8 para 5 provides a list of possible valid comparisons, which is a bit more generous than the restriction on subtraction, since you can compare pointers to two members of the same struct or union, provided the members themselves are of the same type. But unrelated objects do not fall into any of this list. It terminates with the sentence: "In all other cases, the behavior is undefined."

If you want a truly portable solution which doesn't depend on intptr_t, then you could memcmp the pointers. But, really, I think that is only of theoretical interest.

0
On

If you want to remove duplicates, then you can use a nested for loop to go through each pointer, like so:

int duplicates=0;
for (int i=0;i<count;++i;)
    for (int j=i+1;j<count;++j)
        if (data [i]==data [j])
            ++duplicates;

Then allocate enough memory for count-duplicates pointers, and then loop through the original data set and only copy over ones that you haven't copied yet. It's inefficient, but it will work.