parse escaped HTML into node in xqilla

250 Views Asked by At

I'm trying to get text from an rss 2.0 feed (description tag) using XQilla. The address is here. This is fine but the tag contains escaped HTML like

"<a href="some_address>..."

It would be useful to have this HTML in a node and further work with it, but I am at a loss here. I can get the tag contents with

let $desc := $item/*[name()='description']

but do not know how to unescape it. I tried parse-html, which only strips the text of tags and returns a string, like the data() function. Searching on the web suggests that extension functions exist for this, but in other parsers. Is there a way to do it in XQilla? By the way, the code I am working on is a JAWS ResearchIt lookup source.

1

There are 1 best solutions below

2
Jens Erat On BEST ANSWER

XQilla has – like lots of other XQuery implementations – a proprietary function to load XML and HTML from a string (they don't have anchor tags, thus you need to scroll through the document, I'm sorry).

xqilla:parse-xml($xml as xs:string?) as document-node()?
xqilla:parse-html($html as xs:string?) as document-node()?

Given $desc contains the unparsed HTML, xqilla:parse-html($desc) will return the parse result.