I have an RDF/XML file that is formatted like so (truncated to only show the necessary data):
<rdf:RDF xml:base="http://www.gutenberg.org/">
<pgterms:ebook rdf:about="ebooks/48666">
<pgterms:downloads rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">34</pgterms:downloads>
<dcterms:creator>
<pgterms:agent rdf:about="2009/agents/36363">
<pgterms:deathdate rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1926</pgterms:deathdate>
<pgterms:webpage rdf:resource="http://en.wikipedia.org/wiki/Edmund_Candler"/>
<pgterms:alias>Chandler, Edmund</pgterms:alias>
<pgterms:birthdate rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1874</pgterms:birthdate>
<pgterms:name>Candler, Edmund</pgterms:name>
</pgterms:agent>
</dcterms:creator>
<dcterms:title>The Sepoy</dcterms:title>
<dcterms:subject>
<rdf:Description rdf:nodeID="Nd62b88adeb1347d9b99ba9d763e74269">
<dcam:memberOf rdf:resource="http://purl.org/dc/terms/LCSH"/>
<rdf:value>Soldiers -- India -- Conduct of life</rdf:value>
</rdf:Description>
</dcterms:subject>
</pgterms:ebook>
</rdf:RDF>
I would like to retrieve certain properties from this file such as:
- title: The Sepoy
- creator - name: Candler, Edmund
- downloads: 34
- subject - value: Soldiers -- India -- Conduct of life
I have identified that SPARQL is most likely the technology that I would need for this type of job but I have no experience with RDF and am quite confused by how this data is formatted. How can I parse this file to retrieve the desired information in python?