Is there something like a "CSS selector" or XPath grep?

Question

Is there something like a "CSS selector" or XPath grep?

6.6k Views Asked by Boldewyn At 25 November 2025 at 21:55

I need to find all places in a bunch of HTML files, that lie in following structure (CSS):

div.a ul.b

or XPath:

//div[@class="a"]//div[@class="b"]

grep doesn't help me here. Is there a command-line tool that returns all files (and optionally all places therein), that match this criterium? I.e., that returns file names, if the file matches a certain HTML or XML structure.

Original Q&A

There are 4 best solutions below

Keegan Street On 05 June 2012 at 03:06

I have built a command line tool with Node JS which does just this. You enter a CSS selector and it will search through all of the HTML files in the directory and tell you which files have matches for that selector.

You will need to install Element Finder, cd into the directory you want to search, and then run:

elfinder -s "div.a ul.b"

For more info please see http://keegan.st/2012/06/03/find-in-files-with-css-selectors/

kev On 12 May 2020 at 07:46

There are at least 4 tools:

pup - Inspired by jq, pup aims to be a fast and flexible way of exploring HTML from the terminal.
htmlq - Likes jq, but for HTML. Uses CSS selectors to extract bits of content from HTML files.
hq - Lightweight command line HTML processor using CSS and XPath selectors.
xq - Command-line XML and HTML beautifier and content extractor.

Examples:

$ wget http://en.wikipedia.org/wiki/Robots_exclusion_standard -O robots.html

$ pup --color 'title' < robots.html
<title>
 Robots exclusion standard - Wikipedia
</title>

$ htmlq --text 'title' < robots.html
Robots exclusion standard - Wikipedia

$ hq --xpath '//title' < robots.html
<title>robots.txt - Wikipedia</title>

$ xq --xpath '//title' < robots.html
robots.txt - Wikipedia

Dave On 07 September 2011 at 17:08

Per Nat's answer here:

How to parse XML in Bash?

Command-line tools that can be called from shell scripts include:

4xpath - command-line wrapper around Python's 4Suite package
XMLStarlet
xpath - command-line wrapper around Perl's XPath library

**Dave Jarvis** · Accepted Answer

Try this:

Install http://www.w3.org/Tools/HTML-XML-utils/.
- Ubuntu: aptitude install html-xml-utils
- MacOS: brew install html-xml-utils
Save a web page (call it filename.html).
Run: hxnormalize -l 240 -x filename.html | hxselect -s '\n' -c "label.black"

Where "label.black" is the CSS selector that uniquely identifies the name of the HTML element. Write a helper script named cssgrep:

#!/bin/bash

# Ignore errors, write the results to standard output.
hxnormalize -l 240 -x $1 2>/dev/null | hxselect -s '\n' -c "$2"

You can then run:

cssgrep filename.html "label.black"

This will generate the content for all HTML label elements of the class black.

The -l 240 argument is important to avoid parsing line-breaks in the output. For example if <label class="black">Text to \nextract</label> is the input, then -l 240 will reformat the HTML to <label class="black">Text to extract</label>, inserting newlines at column 240, which simplifies parsing. Extending out to 1024 or beyond is also possible.

Is there something like a "CSS selector" or XPath grep?

There are 4 best solutions below

Related Questions in HTML

Related Questions in XML

Related Questions in GREP

Related Questions in SELECTOR

Related Questions in FINDINFILES

Trending Questions

Popular # Hahtags

Popular Questions