I have to handle rather big XML files and I want to use the streaming API of xml-conduit
to go through them and extract the info I need.
In my case using streaming xml-conduit
is especially appealing because I don't need much data from these files, and I need to perform simple aggregations on it so conduits are perfect.
Now, I don't always know the exact structure of the file. Files are generated by different versions of (sometimes buggy) software around the world so I can't impose the schema.
I know, however, elements that I am interested in, and their shapes. But, as I said, these elements can be located in different order with other elements, etc.
What I need, I guess, is just to skip all the elements I am not interested in and only to consider ones that want.
I initially wanted to write something like that:
tagName "person" (requireAttr "age" <* ignoreAttrs) <|> ignoreTag (const True)
but it wouldn't compile because ignoreType
returns Maybe ()
What would be the way to skip all the "unknown" tags when using xml-conduit
streaming API?
As proposed here