Racket’s SXpath: from an XPath string to the “idiomatic way”

364 Views Asked by At

I am working on extracting information from an HTML page in racket.

To do so, I use the html-parsing and sxml packages.

I want to select an element in the page with a particular id. I currently use a plain XPath expression to do so:

(require sxml)

(define expression
  '(test (div (@ (id "foo") 
                 (other-attr "bar"))
              first-div-content) 
         (div (@ (id "baz") 
                 (other-attr "quux"))
              second-div-content)))

(define wanted-result
  '((div (@ (id "foo") (other-attr "bar"))
         first-div-content)))

(equal? ((sxpath "//div[@id='foo']") expression)
        wanted-result)

; ==> #t

However, as the documentation says:

The txpath function accepts the standard XPath syntax, whereas the sxpath function is structured in a more idiomatic (for Racket) way.

So, I would like to express the same path as an s-expression. And while it is easy from the examples given to turn "//div" into '(// div), I did not find how to query specific attributes. I gather that I would have to use an sxml-converter or sxml-converter-as-predicate, but don’t know how to do so.

I know that the xpath version works very well and maybe I should not bother using s-expressions, but I want to understand how it works and then decide for myself what version to use.

1

There are 1 best solutions below

3
On BEST ANSWER

I think that particular xpath is written as

(sxpath '(// (div (@ id (equal? "foo"))))))

The sxpath element (div (@ id (equal? "foo"))) has the form (sxpath-or-symbol reducer-path ...), which means select the elements matchingsxpath-or-symbol that also have non-empty matches for each reducer-path. The rewriting rules in the documentation of sxpath are intended to convey this idea, but they could use more examples.

Attributes are generally treated as if they were child elements of a @ node.