Middle Selectors Ignored in hxselect

406 Views Asked by At

I'm attempting to extract some text from a webpage using hxselect from html-xml-utils 7.4. According to the man, hxselect will accept a comma delimited list of CSS selectors. I have three selectors:

/usr/local/bin/hxnormalize -x -i 0 -l 5000 https://domain.tld | /usr/local/bin/hxselect -s'\n' 'div#searchfieldouter, div#searchbutton, input.searchfield' > ~/result.html

The code performs properly with any one or two of the selectors. When I use more than two, only the first and last have any effect. Regardless of the selectors or number of selectors used, the middle ones seem to be ignored.

Is the bug in me or hxselect?

2

There are 2 best solutions below

0
On

Option: hxselect -c ....

hxnormalize -x -i 0 -l 5000 https://domain.tld |
   hxselect -s'\n' -c 'div#searchfieldouter, div#searchbutton, input.searchfield' > ~/result.html
0
On

I found a footnote in the book Efficient Linux at the command line, exactly described the same problem what you faced:

This example uses thress CSS selectors, some old versions of hxselect can handle only two. If your version of hxselect is afflicted by this shortcoming, download the lastest version from https://www.w3.org/Tools/HTML-XML-utils and build it with the command configure && make.

So, there's definitely a bug in hxselect. The version that used in my environment is 7.7, it has the same problem. I downloaded the latest version 8.6 and compiled&&installed, it solved the problem.