print an xml element only if an unrelated element is value

246 Views Asked by At

I have a bunch of Yandex.XML files with search results. http://api.yandex.com/xml/doc/dg/concepts/response.xml

I want to find out the queries (//yandexsearch/request/query) for all such XML files where the first URL ((//yandexsearch/response/results/grouping/group/doc/url)[1]) equals a certain value (say, http://www.example.org/).

Drawing an analogy with grep, I'd first use the -l flag to list the matching documents, and then pipe such list to xargs xmllint to extract the original query, but perhaps xmllint (or another OS X tool) has a better way (plus, I haven't found xmllint having a flag similar to -l for the original matching in the first place).

1

There are 1 best solutions below

7
On

Search for yandexsearch elements whose response element contains the URL you're looking for, then select the query.

/yandexsearch[
  contains(
    (response/results/grouping/group/doc/url)[1],
    "http://www.example.org"
  )]/request/query

For the example XML given on that page and the search string http://www.yandex.ru, it will return following element:

<query>yandex</query>

If your search string always is the prefix of the url, you might want to use starts-with(...) instead of contains(...).