I have a very large XML file (2Gb) and I am trying to upload it to neo4j with Python. I am creating a dictionary
mydata = etree.parse("myfile.xml")
data = mydata.getroot()
data_list = {}
temp_elt = {} #will hold each element
j=0
for pub in data:
for elt in pub.getchildren():
temp_elt[elt.tag] = elt.text
data_list[j] = temp_elt
j=j+1
temp_elt = {}
This doesn't take very long (considering large amount of data)
Then I try to upload to neo4j:
graph = Graph("http://localhost:7474/db/data/")
graph.delete_all()
for element in data_list:
authnode = Node("Person",author=data_list[element]["author"])
pubnode = Node(data_list[element]["type"],title=data_list[element]["title"])
graph.merge(authnode)
graph.merge(pubnode)
graph.merge(Relationship(authnode,"wrote",pubnode))
This part takes days, if not weeks.
The most similar question I found was this:Importing a large xml file to Neo4j with Py2neo however here it is suggested to transform the xml file into a csv file, which would take at least a whole week itself, so csv is not an option.
Someone else suggested to use Geoff format, however the load2neo driver hasn't been updated in a while and I cannot seem to install it.
Any suggestions?