Erasing elements from a vector, if they are also in another vector

5.6k Views Asked by At

Suppose I've a vector a = {"the", "of"} and a vector b = {"oranges", "the", "of", "apples"}.

I want to compare both vectors and remove the elements from a which are also in b. This is what I came up with:

for (int i = 0; i < a.size(); i++) {
    for (int j =0; j < b.size(); j++) {
       if (a[i] == b[j]) {
          a.erase(a.begin() + i);
       }
    }
}

But this loop is not removing the last element in a. Weird!

5

There are 5 best solutions below

2
On BEST ANSWER

The problem is that when you remove the first element of a the index gets incremented from 0 to 1. On the next iteration of the loop the size of the vector is 1 which meets the condition of the outer loop causing it to terminate. You can avoid any trickery that may be necessary to fix this by simply using std::remove_if, std::find, and a lambda.

#include <iostream>
#include <algorithm>
#include <vector>
#include <string>

int main()
{
    std::vector<std::string> a{ "the", "of" };
    std::vector<std::string> b{ "oranges", "the", "of", "apples" };

    auto pred = [&b](const std::string& key) ->bool
    {
        return std::find(b.begin(), b.end(), key) != b.end();
    };

    a.erase(std::remove_if(a.begin(), a.end(), pred), a.end());

    std::cout << a.size() << "\n";
}

A better test would be to switch the contents of a and b. This will remove "the" and "of" leaving you with "oranges" and "apples".

0
On

The issue you're having is due to the fact that you're removing elements from a as you're iterating over it, but not compensating for that. This is a common problem when trying to write a loop with erases in it.

If it doesn't matter what order the contents of your vectors are in, and you're okay with storing the result in another vector, one of the best approaches is to sort both vectors and call std::set_difference.

#include <algorithm>
#include <iterator>
#include <string>
#include <vector>

int main()
{
    std::vector<std::string> a = { "the", "of" };
    std::vector<std::string> b = { "oranges", "the", "of", "apples" };
    std::vector<std::string> res;

    std::sort(a.begin(), a.end());
    std::sort(b.begin(), b.end());

    std::set_difference(a.begin(), a.end(), b.begin(), b.end(),
        std::back_inserter(res));
}

res will contain all elements of a that weren't in b, which will be empty in this case.

If the order matters, or if it must be done in place, you can use the erase-remove idiom. It's worth nothing that this is likely to be slower for larger vectors, as it is inevitably an O(n^2) algorithm.

#include <algorithm>
#include <iterator>
#include <string>
#include <vector>

struct Pred
{
    const std::vector<std::string>& filter;
    Pred(const std::vector<std::string>& x)
        :filter(x){}

    bool operator()(const std::string& str) const
    {
        return std::find(filter.begin(), filter.end(), str) != filter.end();
    }
};

int main()
{
    std::vector<std::string> a = { "the", "of" };
    std::vector<std::string> b = { "oranges", "the", "of", "apples" };

    Pred pred(b);

    a.erase(std::remove_if(a.begin(), a.end(), pred), a.end());
}

If you don't happen to have access to a C++11 conformant compiler, the Pred structure should be a fairly good stand-in for a lambda. Otherwise, this lambda will do the job:

auto pred = [&b](const std::string& str)
    {
        return std::find(b.begin(), b.end(), str) != b.end();
    };
0
On

In general, instead of walking the vector's content "manually" and selectively erase its items, I'd suggest using STL's already-built algorithms, combining them properly.

Using the Erase-Remove Idiom

In particular, to erase items satisfying some property from a std::vector, you can consider using the erase-remove idiom.
This Q&A on Stackoverflow discusses some options to erase items from STL containers, including the std::vector case.

You can find commented compilable code below, live here:

#include <algorithm>    // for std::remove_if()
#include <iostream>     // for std::cout, std::endl
#include <string>       // for std::string
#include <vector>       // for std::vector
using namespace std;

void print(const char* name, const vector<string>& v);

int main() 
{
    // Input vectors
    vector<string> a = {"the", "of"};
    vector<string> b = {"oranges", "the", "of", "apples"};

    print("a", a);
    print("b", b);

    // Use the erase-remove idiom
    a.erase(
        remove_if(
            a.begin(), 
            a.end(), 

            // This lambda returns true if current string 's'
            // (from vector 'a') is in vector 'b'. 
            [&b](const string& s) 
            {
                auto it = find(b.begin(), b.end(), s);
                return (it != b.end());
            }
        ), 

        a.end()
    );

    cout << "\nAfter removing:\n";
    print("a", a);
}


void print(const char* name, const vector<string>& v) 
{
    cout << name << " = {";
    bool first = true;
    for (const auto& s : v) 
    {
        if (first) 
        {
            first = false;
            cout << s;
        } 
        else 
        {
            cout << ", " << s;
        }
    }
    cout << "}" << endl;
}

Output:

a = {the, of}
b = {oranges, the, of, apples}

After removing:
a = {}

PS
Note also this very similar question on Stackoverflow.


Using std::set_difference()

An alternative approach can be to use std::set_difference(), e.g. something like the following code, live here.
(Note that in this case, as per set_difference() prerequisite, the input vectors must be already sorted.)

#include <algorithm>    // for std::set_difference(), std::sort()
#include <iostream>     // for std::cout, std::endl
#include <iterator>     // for std::inserter
#include <string>       // for std::string
#include <vector>       // for std::vector
using namespace std;

void print(const char* name, const vector<string>& v);

int main() 
{
    // Input vectors
    vector<string> a = {"the", "of"};
    vector<string> b = {"oranges", "the", "of", "apples"};

    print("a", a);
    print("b", b);

    // Sort the vectors before calling std::set_difference().
    sort(a.begin(), a.end());
    sort(b.begin(), b.end());

    // Resulting difference vector
    vector<string> c;
    set_difference(a.begin(), a.end(),
                   b.begin(), b.end(),
                   inserter(c, c.begin()));

    print("difference(a,b)", c);
}


void print(const char* name, const vector<string>& v) 
{
    cout << name << " = {";
    bool first = true;
    for (const auto& s : v) 
    {
        if (first) 
        {
            first = false;
            cout << s;
        } 
        else 
        {
            cout << ", " << s;
        }
    }
    cout << "}" << endl;
}
2
On

this is the correct syntax of erasing things form vector:

myvector.erase (myvector.begin()+5);

Secondly, after you erased it, your index of this vector would be invalid.

So I recommend you do a two-round scan. First round ,you mark the elements you want to remove. IN second round, you can erase them.

BTW your algorithm is O(n^2) time complexity. If you can, I recommend you sort your vector firstly. Then you can use much faster algorithm to deal with it.

1
On

Try the following

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <cassert>

int main()
{
    std::vector<std::string> a = { "the", "of" };
    std::vector<std::string> b = { "oranges", "the", "of", "apples" };

    for ( auto it = a.begin(); it != a.end(); )
    {
        if ( std::find( b.begin(), b.end(), *it ) != b.end() )
        {
            it = a.erase( it ); 
        }
        else
        {
            ++it;
        }
    }

    assert( a.empty() );
}

Of course it would be better if the vectors would be ordered.