We are converting our ancient FrameMaker docs to XML. My job is to convert this:
<?FM MARKER [Index] foo, bar ?>`
to this:
<indexterm>
<primary>foo, bar</primary>
</indexterm>
I'm not worried about that part (yet); what is stumping me is that the ProcessingInstruction
s are all over the documents and could potentially be under any element, so I need to be able to search the entire tree, find them, and then process them. I cannot figure out how to iterate over an entire XML tree using minidom
. Am I missing some secret method/iterator? This is what I've looked at thus far:
Elementtree
has the excellentElement.iter()
method, which is a depth-first search, but it doesn't processProcessingInstruction
s.ProcessingInstruction
s don't have tag names, so I cannot search for them usingminidom
'sgetElementsByTagName
.xml.sax
'sContentHandler.processingInstruction
looks like it's only used to createProcessingInstruction
s.
Short of creating my own depth-first search algorithm, is there a way to generate a list of ProcessingInstruction
s in an XML file, or identify their parents?
Use the XPath API of the
lxml
module as such:References