I'm using:
wget --spider --force-html -r -l5 http://example.com 2>&1 | grep '^--' | awk '{print $3}' > urls.txt
It works great; however, it doesn't seem to copy the 'href=' links on each page.
wget -q http://example.com -O - | \
tr "\t\r\n'" ' "' | \
grep -i -o '<a[^>]\+href[ ]*=[ \t]*"\(ht\|f\)tps\?:[^"]\+"' | \
sed -e 's/^.*"\([^"]\+\)".*$/\1/g'>urls.txt
This second one does grab the href links I'm looking for, but it doesn't spider.
I'm trying to make the first one accept href links on each page or the second one perform spider. I'm aware there are better tools to do this, but I have to use WGET in this example.
wgetdoes not offer such an option. Please read its man page.You could use
lynxfor this:From its man page:
Copy of this answer