networkit bipartite graph connected components only when 2 or more common edges

64 Views Asked by Alex_Y At 16 June 2023 at 08:03

I'm new to the graphs, but trying to get my path through. Basically, the idea is very simple - we have "transactions" with multiple "features" and need to assign the same Id to transactions, which have 2 or more common features (same values). The number of "transactions" is about 5500 000 records.
For example:

Transaction	A	B	C	D
0	1	1	1	2
1	2	1	1	7
2	3	1	2	9
3	4	1	3	8
4	5	2	3	4

Here only transactions 0 and 1 have 2 common features, so they should be combined with same id.

Transaction	Id
0	1
1	1
2	2
3	3
4	4

My first approach was to create a graph with all nodes (transactions), then in dataframe filter out matching pairs with duplicates in 2 or more features and create edges for those nodes. But here I face an issue that it's impossible to process so huge dataframe in normal amount of time, even with multiprocessing.
So, the second approach is to create a bipartite graph where source nodes - transactions and target nodes - features. Then I was able to extract connected components but the result groups were too huge, as transactions even with a single common edge were grouped to the same Id.
Now I'm struggling with the task of how to get connected source nodes that have 2 or more common target features..
Appreciate any help.

Original Q&A

networkit bipartite graph connected components only when 2 or more common edges

There are 0 best solutions below

Related Questions in GRAPH

Related Questions in GRAPH-THEORY

Related Questions in BIPARTITE

Related Questions in CONNECTED-COMPONENTS

Related Questions in NETWORKIT

Trending Questions

Popular # Hahtags

Popular Questions