Importing large XML file into neo4j using py2neo

173 Views Asked by At

I have a very large XML file (2Gb) and I am trying to upload it to neo4j with Python. I am creating a dictionary

mydata = etree.parse("myfile.xml")
data = mydata.getroot()

data_list = {}
temp_elt = {} #will hold each element

j=0
for pub in data:    
    for elt in pub.getchildren():
        temp_elt[elt.tag] = elt.text 
    data_list[j] = temp_elt  
    j=j+1
    temp_elt = {}

This doesn't take very long (considering large amount of data)

Then I try to upload to neo4j:

graph = Graph("http://localhost:7474/db/data/")
graph.delete_all()

for element in data_list:
    authnode = Node("Person",author=data_list[element]["author"])
    pubnode = Node(data_list[element]["type"],title=data_list[element]["title"])

    graph.merge(authnode)
    graph.merge(pubnode)

    graph.merge(Relationship(authnode,"wrote",pubnode))

This part takes days, if not weeks.

The most similar question I found was this:Importing a large xml file to Neo4j with Py2neo however here it is suggested to transform the xml file into a csv file, which would take at least a whole week itself, so csv is not an option.

Someone else suggested to use Geoff format, however the load2neo driver hasn't been updated in a while and I cannot seem to install it.

Any suggestions?

0

There are 0 best solutions below