Using WGET to Get Links on the Page

39 Views Asked by tombraider123 At 20 March 2024 at 14:06

I'm using:

wget --spider --force-html -r -l5 http://example.com 2>&1 | grep '^--' | awk '{print $3}' > urls.txt

It works great; however, it doesn't seem to copy the 'href=' links on each page.

wget -q http://example.com -O - | \
tr "\t\r\n'" '   "' | \
grep -i -o '<a[^>]\+href[ ]*=[ \t]*"\(ht\|f\)tps\?:[^"]\+"' | \
sed -e 's/^.*"\([^"]\+\)".*$/\1/g'>urls.txt

This second one does grab the href links I'm looking for, but it doesn't spider.

I'm trying to make the first one accept href links on each page or the second one perform spider. I'm aware there are better tools to do this, but I have to use WGET in this example.

Original Q&A

There are 1 best solutions below

Nihad Badalov On 20 March 2024 at 14:15

wget does not offer such an option. Please read its man page.

You could use lynx for this:

lynx -dump -listonly http://aligajani.com | grep -v facebook.com > file.txt

From its man page:

   -listonly
          for -dump, show only the list of links.

Copy of this answer

Using WGET to Get Links on the Page

There are 1 best solutions below

Related Questions in HREF

Related Questions in WGET

Trending Questions

Popular # Hahtags

Popular Questions