I am working on extracting information from an HTML page in racket.
To do so, I use the html-parsing and sxml packages.
I want to select an element in the page with a particular id. I currently use a plain XPath expression to do so:
(require sxml)
(define expression
'(test (div (@ (id "foo")
(other-attr "bar"))
first-div-content)
(div (@ (id "baz")
(other-attr "quux"))
second-div-content)))
(define wanted-result
'((div (@ (id "foo") (other-attr "bar"))
first-div-content)))
(equal? ((sxpath "//div[@id='foo']") expression)
wanted-result)
; ==> #t
However, as the documentation says:
The txpath function accepts the standard XPath syntax, whereas the sxpath function is structured in a more idiomatic (for Racket) way.
So, I would like to express the same path as an s-expression. And while it is easy from the examples given to turn "//div"
into '(// div)
, I did not find how to query specific attributes. I gather that I would have to use an sxml-converter
or sxml-converter-as-predicate
, but don’t know how to do so.
I know that the xpath version works very well and maybe I should not bother using s-expressions, but I want to understand how it works and then decide for myself what version to use.
I think that particular xpath is written as
The sxpath element
(div (@ id (equal? "foo")))
has the form(sxpath-or-symbol reducer-path ...)
, which means select the elements matchingsxpath-or-symbol
that also have non-empty matches for eachreducer-path
. The rewriting rules in the documentation ofsxpath
are intended to convey this idea, but they could use more examples.Attributes are generally treated as if they were child elements of a
@
node.