How to print values from different fields combined?

86 Views Asked by At

I'm using xidel and playing with web scraping (without templates for now). I would like to get the title and price of a book and its price so they could be printed on one line for each entry:

title --> price

Based on an answer from this forum I can write:

./xidel -e 'doc("https://books.toscrape.com")//*[self::p[@class="price_color"] or self::h3]'

But how to write title and its price in one line?

Thank you

3

There are 3 best solutions below

4
Martin Honnen On

Try

./xidel -e 'doc("https://books.toscrape.com")//article[@class = "product_pod"]!(.//h3 || "-->" || .//p[@class="price_color"])'
0
SebastianM On

I followed Martin advice and check the html structure and indeed there was an Article element in the code that should be used. Martin solution works and the one I came to probably at the same time is:

./xidel -e 'doc("https://books.toscrape.com")//article ! string-join((.//p[@class="price_color"], .//h3), ";")'

Need to remember: Check the HTML structure first!

Issue solved

0
Reino On

Need to remember: Check the HTML structure first!

If an HTML source is minified, or unreadably prettified (with lots of illogical indentations for instance), then for a better overview of all the element-nodes I'd recommend either of the following 2 commands:

$ xidel -s "https://books.toscrape.com" -e . --output-node-format=xml --output-node-indent
$ xidel -se 'serialize(doc("https://books.toscrape.com"),{"indent":true()})'

Then you'll quickly notice that the text-nodes you're after are direct children of the <article>-element-node and not descendants (.// not necessary). And since it's all (text-)nodes you're dealing with, you don't really need the ! (simple map operator):

$ xidel -s "https://books.toscrape.com" -e '
  //article/join((div/p[@class="price_color"],h3),";")
'

And personally, I only use x:join() / string-join() for combining 3 items or more. For 2 items I always do a simple string-concatenation:

$ xidel -s "https://books.toscrape.com" -e '
  //article/(div/p[@class="price_color"]||";"||h3)
'
$ xidel -s "https://books.toscrape.com" -e '
  //article/concat(div/p[@class="price_color"],";",h3)
'
$ xidel -s "https://books.toscrape.com" -e '
  //article/x"{div/p[@class="price_color"]};{h3}"
'

The last one is Xidel's own extended-string-syntax.