How to retrieve specific values from RDF XML file in python

166 Views Asked by At

I have an RDF/XML file that is formatted like so (truncated to only show the necessary data):

<rdf:RDF xml:base="http://www.gutenberg.org/">
    <pgterms:ebook rdf:about="ebooks/48666">
        <pgterms:downloads rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">34</pgterms:downloads>
        <dcterms:creator>
            <pgterms:agent rdf:about="2009/agents/36363">
            <pgterms:deathdate rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1926</pgterms:deathdate>
            <pgterms:webpage rdf:resource="http://en.wikipedia.org/wiki/Edmund_Candler"/>
            <pgterms:alias>Chandler, Edmund</pgterms:alias>
            <pgterms:birthdate rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1874</pgterms:birthdate>
            <pgterms:name>Candler, Edmund</pgterms:name>
            </pgterms:agent>
        </dcterms:creator>
        <dcterms:title>The Sepoy</dcterms:title>
        <dcterms:subject>
            <rdf:Description rdf:nodeID="Nd62b88adeb1347d9b99ba9d763e74269">
            <dcam:memberOf rdf:resource="http://purl.org/dc/terms/LCSH"/>
            <rdf:value>Soldiers -- India -- Conduct of life</rdf:value>
            </rdf:Description>
        </dcterms:subject>
    </pgterms:ebook>
</rdf:RDF>

I would like to retrieve certain properties from this file such as:

  • title: The Sepoy
  • creator - name: Candler, Edmund
  • downloads: 34
  • subject - value: Soldiers -- India -- Conduct of life

I have identified that SPARQL is most likely the technology that I would need for this type of job but I have no experience with RDF and am quite confused by how this data is formatted. How can I parse this file to retrieve the desired information in python?

0

There are 0 best solutions below