I need to find all places in a bunch of HTML files, that lie in following structure (CSS):
div.a ul.b
or XPath:
//div[@class="a"]//div[@class="b"]
grep doesn't help me here. Is there a command-line tool that returns all files (and optionally all places therein), that match this criterium? I.e., that returns file names, if the file matches a certain HTML or XML structure.
Try this:
aptitude install html-xml-utilsbrew install html-xml-utilshxnormalize -l 240 -x filename.html | hxselect -s '\n' -c "label.black"Where
"label.black"is the CSS selector that uniquely identifies the name of the HTML element. Write a helper script namedcssgrep:You can then run:
This will generate the content for all HTML
labelelements of the classblack.The
-l 240argument is important to avoid parsing line-breaks in the output. For example if<label class="black">Text to \nextract</label>is the input, then-l 240will reformat the HTML to<label class="black">Text to extract</label>, inserting newlines at column 240, which simplifies parsing. Extending out to 1024 or beyond is also possible.See also: