I have a networkx graph with events spanning several months. I wanted to see how a node's centrality score changes over time.
I am planning on using several different centrality measures so I have created a function to select a specific sender (I don't have many unique senders) and a specific date, then create a networkx graph and calculate the degree. Then add everything to a dataframe.
But my code seems to be a bit convoluted and I'm not sure it's working correctly, since my output:
feature degree date
0 A 1.0 2017-01-02
1 35 1.0 2017-01-02
0 A 1.0 2017-01-20
1 18 1.0 2017-01-20
contains nodes 35 and 18, but I only want A. Is there a better way of doing this?
import numpy as np
import pandas as pd
from datetime import datetime
import networkx as nx
df = pd.DataFrame({'feature':['A','B','A','B','A','B','A','B','A','B'],
'feature2':['18','78','35','14','57','68','57','17','18','78'],
'timestamp':['2017-01-20T11','2017-01-01T13',
'2017-01-02T12','2017-02-01T13',
'2017-03-01T14','2017-05-01T15',
'2017-04-01T16','2017-04-01T17',
'2017-12-01T17','2017-12-01T19']})
df['timestamp'] = pd.to_datetime(pd.Series(df['timestamp']))
df['date'], df['time']= df.timestamp.dt.date, df.timestamp.dt.time
def test(feature,date,name,col_name,nx_measure):
feature = df[df['feature']== feature]
feature['date_str'] = feature['date'].astype(str)
one_day = feature[feature['date_str']==date]
oneDay_graph =nx.from_pandas_edgelist(one_day, source = 'feature', target = 'feature2', create_using=nx.DiGraph)
name = pd.DataFrame()
name['feature']= nx_measure(oneDay_graph).keys()
name[col_name]= nx_measure(oneDay_graph).values()
name['date'] = date
return name
a =test('A','2017-01-02','degree','degree',nx.degree_centrality)
b = test('A','2017-01-20','degree','degree',nx.degree_centrality)
a.append(b)
desiered output
feature degree date
0 A 1.0 2017-01-02
0 A 1.0 2017-01-20
When you set
name['feature']= nx_measure(oneDay_graph).keys()
, you're getting a row for each element of the graph, which in this case is both 'A' and the target node of 35 or 18. What you should be doing instead is something likeHere's a more thorough refactoring of your approach:
Result:
In fact, I suspect that this approach produces the wrong answers since you only consider node centrality relative to the subgraph containing feature 'A', but not feature 'B'. I suspect that the following is a better approach:
Result: