C++ Stl Set Behavior

364 Views Asked by At

I was trying to run the below code. What I am finding is that there is difference in the output. I understand that there is an issue with ordering mechanism used in the Comparator functionality. What I am basically looking for is: 1) How does Set internally stores data. 2) How can I resolve this issue or the best way to copy the data to a different Set. 3) How exactly the ordering is creating this issue.

#include <iostream>
#include <set>
using namespace std;
struct Comparator {
  bool operator()( const int& a, const int& b ) {
    if( a <= b ) 
      return true;
    else
      return false;
  }
};
int main()
{
  set< int, Comparator > customSet;
  for( unsigned k = 0, index = 2; k < 10; ++k ) {
    customSet.insert( index );
  }

  set< int, Comparator >::iterator iter = customSet.begin();
  for(; iter != customSet.end(); ++iter ) {
    cout<<*iter<<endl;
  }

  cout<<"---------------------------------"<<endl;
  set< int, Comparator > tempCustomSet ;//= customSet;
  tempCustomSet.insert( customSet.begin(), customSet.end() );

  iter = tempCustomSet.begin();
  for(; iter != tempCustomSet.end(); ++iter ) {
    cout<<*iter<<endl;
  }

  return 0;
}
4

There are 4 best solutions below

0
On

You are getting different outputs in two cases because you are inserting in different ways. In case 1 you are inserting element 2 ten times. In this case, when you insert integer 2 after first time, your Comparator() function will be called to decide where to insert. In the other case, you are inserting a range. Here, called function takes first argument, i,e customSet.begin() and checks it with the other argument i,e customSet.end(), if these two are not equal then only an element is inserted otherwise element will not be inserted.

6
On

See this reference for more details on std::set. The implementation should not concern you (they can differ from platform to platform), since the interface and complexity guarantees is all that matters to the Standard. Typical implementations are using red-black-trees.

You need to make your Comparator use operator< and not operator<=. The reason is that std::set will consider elements equivalent if !Comparator(a, b) && !Comparator(b, a) evaluates to true (i.e. neither is strictly less than the other).

But with <=, you have a <= a equal to true, so that !(a<=a) && !(a<=a) gives false for equal elements. Whereas with < you have a < a equal to false so !(a<a) && !(a<a) gives true.

The right to thing to do is:

struct Comparator 
{
    bool operator()(int const& lhs, int const& rhs) const 
    {
        return lhs < rhs; 
    }
};

This will guarantee that equal elements are considered equivalent. Note that this is discussed at length in Effective STL, "Item 19. Understand the difference between equality and equivalence."

0
On

The problem is most likely because your comparison does not implement a strict weak ordering. The internal ordering mechanism on the set relies on this. You can get SWO by changing your comparison to a less-than:

struct Comparator {
  bool operator()( const int& a, const int& b ) const {
    return ( a < b ); 
  }
};

On the other hand, std::set will use this comparison criteria by default so you need not specify it.

There is some related information in my answer to this question ( and zillions of other SO questions).

4
On

1) How does Set internally stores data

The only requirements are that the elements are:

  • ordered according to the comparator, so that if Comp(a,b), then a appears before b when iterating the set;
  • unique, so there are no distinct elements for which both Comp(a,b) and Comp(b,a) hold.

and that the operations meet certain complexity requirements.

In practice, they are typically stored in a binary search tree; but that doesn't matter to a user.

2) How can I resolve this issue or the best way to copy the data to a different Set

In order to meet the requirements, the comparator must be a strict weak ordering, like <, so that Comp(a,a) is always false, rather than a non-strict ordering like <=. Since < is the default, that means you don't need a custom comparator at all.

3) How exactly the ordering is creating this issue

Note that your first loop is inserting the value 2 ten times; I'm not sure whether that's the intention or not.

Given the required strict ordering, insert(b) might look for an insertion point by finding the first element a such that Comp(a,b) is false; i.e. the first element that b should not come after. It will then check for uniqueness by checking Comp(b,a). If both are false, then that indicates that the two values are equivalent, so b will not be inserted.

Since your comparison is not strict, this uniqueness test might fail; so you might end up with a duplicate entry. Or something else might happen - the behaviour is undefined.