networkit bipartite graph connected components only when 2 or more common edges

64 Views Asked by At

I'm new to the graphs, but trying to get my path through. Basically, the idea is very simple - we have "transactions" with multiple "features" and need to assign the same Id to transactions, which have 2 or more common features (same values). The number of "transactions" is about 5500 000 records.
For example:

Transaction A B C D
0 1 1 1 2
1 2 1 1 7
2 3 1 2 9
3 4 1 3 8
4 5 2 3 4
  • Here only transactions 0 and 1 have 2 common features, so they should be combined with same id.
Transaction Id
0 1
1 1
2 2
3 3
4 4

My first approach was to create a graph with all nodes (transactions), then in dataframe filter out matching pairs with duplicates in 2 or more features and create edges for those nodes. But here I face an issue that it's impossible to process so huge dataframe in normal amount of time, even with multiprocessing.
So, the second approach is to create a bipartite graph where source nodes - transactions and target nodes - features. Then I was able to extract connected components but the result groups were too huge, as transactions even with a single common edge were grouped to the same Id.
Now I'm struggling with the task of how to get connected source nodes that have 2 or more common target features..
Appreciate any help.

0

There are 0 best solutions below