I am working with enron email dataset and I am trying to remove email addresses that don't have "@enron.com" (i.e. I would like to have enron emails only). When I tried to delete those addresses without @enron.com, some emails just got skipped for some reasons. A small graph is shown below where vertices are email address. This is gml format:
Creator "igraph version 0.7 Sun Mar 29 20:15:45 2015"
Version 1
graph
[
directed 1
node
[
id 0
label "[email protected]"
]
node
[
id 1
label "[email protected]"
]
node
[
id 2
label "[email protected]"
]
node
[
id 3
label "[email protected]"
]
node
[
id 4
label "[email protected]"
]
node
[
id 5
label "[email protected]"
]
node
[
id 6
label "[email protected]"
]
node
[
id 7
label "[email protected]"
]
node
[
id 8
label "[email protected]"
]
node
[
id 9
label "[email protected]"
]
edge
[
source 5
target 5
weight 1
]
]
My code is:
G = ig.read("enron_email_filtered.gml")
for v in G.vs:
print v['label']
if '@enron.com' not in v['label']:
G.delete_vertices(v.index)
print 'Deleted'
In this dataset, 7 emails should be deleted. However, based on the above code, only 5 emails are removed.
From the tutorial here, you can access all the vertices with a specific property, and then delete them in bulk as follows:
Here is the output I got: