Can't read and query graphml file with lxml

38 Views Asked by At

I have an XML (GraphML) file which is loosely defined as follows:

<?xml version="1.0" encoding="utf-8"?>
<graphml xmlns:x="http://www.yworks.com/xml/yfiles-common/markup/3.0" xmlns:y="http://www.yworks.com/xml/yfiles-common/3.0" xmlns:sys="http://www.yworks.com/xml/yfiles-common/markup/primitives/2.0" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://www.yworks.com/xml/schema/graphml.wpf/3.0/ygraphml.xsd " xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://graphml.graphdrawing.org/xmlns">
    <key .... />
    <key .... />
    <key .... />
    <key .... />
    <data .. />
    <data .. />
    <graph id="G">
        <node id=".."></node>
        <node id=".."></node>
        <node id=".."></node>
        <node id=".."></node>
        <edge id=".."></edge>
        <edge id=".."></edge>
        <edge id=".."></edge>
    </graph>
</graphml>

I am interested in retrieving all the node elements in a python list for further processing. Here's what I have tried so far:

from lxml import etree

tree = etree.parse("test1.graphml")
nodes = tree.findall('//graphml/node')

print("done")

However this didn't work and I'm not sure why. What am i doing wrong here?

1

There are 1 best solutions below

0
On

In this case it's important to be aware of namespaces:

from lxml import etree

tree = etree.parse("test1.graphml")
root = tree.getroot()
namespaces = root.nsmap

nodes = root.findall('.//ns:node', namespaces={'ns': namespaces[None]})

print("done")