keys = {" /> keys = {" /> keys = {"/>

how to create custom intersected container from 2 distinct container types

75 Views Asked by At

Using the following containers:

std::vector<std::pair<std::string, int>> keyVals = {
    {"A", 1}, {"B", 2},  {"C", 3}
};

std::vector<std::string> keys = {
    "A", "B", "C"
};

I need to construct a sorted output container

std::set<std::tuple<std::string, int, int>>

using the std::set_intersection between keyVals and keys.

These types use a common field (in my case a string) this is also used for pre-sorting these vectors via the default std::less - however no need in the above example as they are already sorted.

I need help to create a sorted container C that is not just a simple copy of elements from A to the output over the intersection range.

Instead I need to be able to construct each output element C - in my case a std::tuple<std::string, int, int> (where the first entry in the tuple is the common link string field and the other 2 int are some global fields.

To illustrate the problem I created a coliru live demo where I commented out the broken code - I don't know how to invoke a custom constructor for each iteration - please help.

#include <vector>
#include <variant>
#include <memory>
#include <iostream>
#include <algorithm>

// Helper to get Lambda with multiple signatures in place 
// template deduction guide is a C++17 feature! 
template<class... Ts> struct overload : Ts... { using Ts::operator()...; };
// this line is necessary for c++17 - not required for c++20
template<class... Ts> overload(Ts...) -> overload<Ts...>; 

template<typename T>
std::ostream& operator<<(std::ostream& os, const std::vector<T>& vec) {
    for (auto& el : vec) {
        os << el.first << ' ';
    }
    return os;
}

// if c++17 is not available, you have to write a functor/function
// with both signatures 
int main()
{
    std::vector<std::pair<std::string, int>> keyVals = {
        {"A", 1}, {"B", 2},  {"C", 3}
    };
    
    std::vector<std::string> keys = {
        "A", "B", "C"
    };

    // std::set_intersection results can only be of the first iterator type
    std::vector<std::pair<std::string, int>> intersection;

    // this works
    std::set_intersection(
        keyVals.begin(), keyVals.end(),
        keys.begin(), keys.end(),
        std::back_inserter(intersection),
        overload {
            [](const std::pair<std::string, int>& lhs, const std::string& rhs) {
                return lhs.first < rhs;
            },
            [](const std::string& lhs, const std::pair<std::string, int>& rhs) {
                return lhs < rhs.first;
            }
        }
    );
    
    std::cout << intersection << std::endl;    
    
#if 0
    // std::set_intersection results can only be of the first iterator type
    std::vector<std::tuple<std::string, int, int>> broken_intersection;

    // this does not work as no match for 'operator=' *__result = *__first1; 
    std::set_intersection(
        keyVals.begin(), keyVals.end(),
        keys.begin(), keys.end(),
        std::back_inserter(broken_intersection),  // help here - I need to be able to 
        overload {
            [](const std::pair<std::string, int>& lhs, const std::string& rhs) {
                return lhs.first < rhs;
            },
            [](const std::string& lhs, const std::pair<std::string, int>& rhs) {
                return lhs < rhs.first;
            }
        }
    );
    
    std::cout << broken_intersection << std::endl;        
#endif
}
1

There are 1 best solutions below

0
A M On

The problem is, as already noted, the different types of the data.

All input and output containers habe a different type.

And with that, 2 main operations in std::set_difference need to be expressed.

  1. The comparison of the dereferenced input iterators of different type
  2. The assignment of a rvalue from the dereferenced input to the dereferenced output iterator.

Problem 1 can be easily solved with a Functor, for which we will create 2 function call operators with the 2 needed signatures. This will allow for comparison in both directions.

Starting with C++14 you may also use a generic Lambda:

auto cmp = [](auto lhs, auto rhs) { return lhs.mCommonField < rhs.mCommonField; };

But then you need some common fields in both vectors.

The same approach would be possible with a Functor having a templated call operator:

struct Comparator
{
    template<typename T, typename U>
    bool operator()(T const& lhs, U const& rhs) const {
        return lhs.commonField < rhs.commonField;
    }
};

Also here you need a common field. And they must have the same name. Maybe this will not fit here.

Or, we could create a wrapper for a Common Field:

struct Common {
    std::string const& commonField;
    Common(StructA const& sa) : commonField{sa.commonField} {};
    Common(StructB const& sb) : commonField{sb.commonField} {};
};

Here the common field must just be of the same type.

Many possibilities


For problem 2 we need a small wrapper. Either for the first input iterator or the output iterator. The wrapper needs to have iterator functionality and a dereferencing operator that returns a Tuple, so that we can assign this Tuple to the resulting vector of Tuples.

So we will create a very small custom iterator, with only the minimumm functions needed by std::set_intersection

  • Construction
  • Dereferencing
  • Pre Incrementing
  • Comparison for not equal

Within the iterator we will use a simple pointer to a Pair as the real iterator. The constructor can get a pointer to the Pairs in the first vector with the std::vector data function. That is very convenient.

The dereferencing operator does the trick. It will not return a Pair, but construct a Tuple and return that. So, this is the conversion from the Pair to a Tuple.

Putting this altogether may lead to one of many potential solutions:

#include <string>
#include <vector>
#include <iostream>
#include <algorithm>
#include <tuple>
#include <utility>

using Key = std::string;
using Value = int;
using Keys = std::vector<Key>;
using Pair = std::pair<Key, Value>;
using KeyVals = std::vector<Pair>;
using Tuple = std::tuple<Key, Value, Value>;
using Tuples = std::vector<Tuple>;

std::ostream& operator<<(std::ostream& os, const Tuples& t) {
    for (auto& el : t)  os << std::get<0>(el) << ' ';
    return os;
}

struct Iterator {
    using iterator_category = std::output_iterator_tag;
    using difference_type = std::ptrdiff_t;
    using value_type = Tuple;
    using pointer = Tuple*;
    using reference = Tuple&;

    Pair* iter; // The internal iterator. Just a pointer
    Iterator(Pair* kv) { iter = kv; }

    Tuple operator *() const { return Tuple{ iter->first,0,0}; }
    Iterator operator ++() { ++iter; return *this; }
    bool operator !=(const Iterator& other) { return iter != other.iter; }
};

struct Comp {
    bool operator ()(const Tuple& lhs, const std::string& rhs) { return std::get<0>(lhs) < rhs; }
    bool operator ()(const std::string& lhs, const Tuple& rhs) { return lhs < std::get<0>(rhs) ; }
};
int main()
{
    KeyVals keyVals = { {"A", 1}, {"B", 2},  {"C", 3} };
    Keys keys = { "A", "B", "C" };

    // Resulting vector
    Tuples tupleResult{};

    // Create Iterators to KeyVals
    Iterator begin(keyVals.data());
    Iterator end(keyVals.data() + keyVals.size());

    // Do the intersection
    std::set_intersection(begin, end, keys.begin(), keys.end(), std::back_inserter(tupleResult), Comp());

    // Show result
    std::cout << tupleResult << '\n';
}

Looks a little bit clumsy

On a 2nd thought, you could convert both input vectors to the needed output type using std::transform. But this will cost additional space and time.

Anyway, please see the next potential solution:

#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <iterator>
#include <utility>
#include <tuple>

using Key = std::string;
using Value = int;
using Keys = std::vector<Key>;
using Pair = std::pair<Key, Value>;
using KeyVals = std::vector<Pair>;
using Tuple = std::tuple<Key, Value, Value>;
using Tuples = std::vector<Tuple>;

std::ostream& operator<<(std::ostream& os, const Tuples& t) {
    for (auto& el : t)  os << std::get<0>(el) << ' ';
    return os;
}

int main()
{
    KeyVals keyVals = { {"A", 1}, {"B", 2},  {"C", 3} };

    Keys keys = { "A", "B", "C" };

    // Convert to same type
    Tuples tupleKeyVals{}, tupleKeys{};

    std::transform(keyVals.begin(), keyVals.end(), std::back_inserter(tupleKeyVals), [](const Pair& p) { return Tuple{ p.first, p.second, 0 }; });
    std::transform(keys.begin(), keys.end(), std::back_inserter(tupleKeys), [](const std::string& s) { return Tuple{ s, 0, 0 }; });

    // Resulting vector
    Tuples tupleResult{};

    // Build intersection based on first tuple element
    std::set_intersection(tupleKeyVals.begin(), tupleKeyVals.end(), tupleKeys.begin(), tupleKeys.end(), std::back_inserter(tupleResult),
        [](const Tuple& t1, const Tuple& t2) { return std::get<0>(t1) < std::get<0>(t2); });

    std::cout << tupleResult << '\n';
}

This looks a little bit cleaner. But disadvantage is much more memory consumption and longer operation time.


Last but not least, we could implement an own set_intersetcion function, which is quite simple.

#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <iterator>
#include <utility>
#include <tuple>

using Key = std::string;
using Value = int;
using Keys = std::vector<Key>;
using Pair = std::pair<Key, Value>;
using KeyVals = std::vector<Pair>;
using Tuple = std::tuple<Key, Value, Value>;
using Tuples = std::vector<Tuple>;

std::ostream& operator<<(std::ostream& os, const Tuples& t) {
    for (auto& el : t)  os << std::get<0>(el) << ' ';
    return os;
}

// Specialized intersection function
Tuples set_intersection(KeyVals& kv, Keys& k) {
    Tuples result{};
    KeyVals::iterator iterKeyVal = kv.begin();
    Keys::iterator iterKey = k.begin();

    while (iterKeyVal != kv.end() and iterKey != k.end()) {
        if (iterKeyVal->first < *iterKey)
            ++iterKeyVal;
        else {
            if (not(*iterKey < iterKeyVal->first))
                result.emplace_back(Tuple{ (iterKeyVal++)->first ,0,0 });
            ++iterKey;
        }
    }
    return result;
}

int main()
{
    // Input
    KeyVals keyVals = { {"A", 1}, {"B", 2},  {"C", 3} };
    Keys keys = { "A", "B", "C" };

    // Resulting vector
    Tuples tupleResult = set_intersection(keyVals, keys);

    // Show result
    std::cout << tupleResult << '\n';
}

But in the end you need to decide based on other requirements or constraints.