Getting IPs from a .html file

Question

Getting IPs from a .html file

414 Views Asked by Xenon At 07 June 2025 at 09:30

Their is a site with socks4 proxies online that I use in a proxychains program. Instead of manually entering new IPs in, I was trying to automate the process. I used wget to turn it into a .html file on my home directory, this is some of the output if i cat the file:

</font></a></td><td colspan=1><font class=spy1>111.230.138.177</font> <font class=spy14>(Shenzhen Tencent Computer Systems Company Limited)</font></td><td colspan=1><font class=spy1>6.531</font></td><td colspan=1><TABLE width='13' height='8' CELLPADDING=0 CELLSPACING=0><TR  BGCOLOR=blue><TD  width=1></TD></TR></TABLE></td><td colspan=1><font class=spy1><acronym title='311 of 436 - last check status=OK'>71% <font class=spy1>(311)</font> <font class=spy5>-</font></acronym></font></td><td colspan=1><font class=spy1><font class=spy14>05-jun-2020</font> 23:06 <font class=spy5>(4 mins ago)</font></font></td></tr><tr class=spy1x onmouseover="this.style.background='#002424'" onmouseout="this.style.background='#19373A'"><td colspan=1><font class=spy14>139.99.104.233<script type="text/javascript">document.write("<font class=spy2>:<\/font>"+(a1j0e5^q7p6)+(m3f6f6^r8c3)+(a1j0e5^q7p6)+(t0b2s9^y5m3)+(w3c3m3^z6j0))</script></font></td><td colspan=1>SOCKS5</td><td colspan=1><a href='/en/anonymous-proxy-list/'><font class=spy1>HIA</font></a></td><td colspan=1><a href='/free-proxy-list/CA/'><font class=spy14>Canada</

As you can see the IP is usually followed by a spy[0-19]> . I tried to parse out the actual IP's with awk using the following code:

awk '/^spy/{FS=">";  print $2 } file-name.html

This is problematic because their would be a bunch of other stuff trailing after the IP, also I guess the anchor on works for the beginning of a line? Anyway I was wondering if anyone could give me any ideas on how to parse out the IP addresses with awk. I just started learning awk, so sorry for the noob question. Thanks

Original Q&A

There are 3 best solutions below

**Gilles Quénot** · Answer 1

Using a proper XML/HTML parser and a xpath expression:

xidel -se '(//td[@colspan=1]/font[@class="spy1"])[1]/text()' file.html

Output:

111.230.138.177

Or if it's not all the time the first xpath match:

xidel -se '//td[@colspan=1]/font[@class="spy1"]/text()' file.html |
   perl -MRegexp::Common -lne 'print $1 if /($RE{net}{IPv4})/'

**Slawomir Dziuba** · Answer 2

AWK is great for hacking IP addresses:

gawk -v RS="spy[0-9]*" '{match($0,/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/); ip = substr($0,RSTART,RLENGTH); if (ip) {print ip}}' file.html

Result:

111.230.138.177
139.99.104.233

Explanation.

You must use GAWK if you want the record break to contain a regular expression.

We divide the file into lines containing one IP address using regex in the RS variable.
The match function finds the second regex in the entire line. Regex is 4 groups from 1 to 3 numbers, separated by a dot (the IP address).
Then the substract function retrieves from the entire line ($0) a fragment of RLENGTH length starting from RSTART (the beginning of the searched regex).
IF checks if the result has a value and if so prints it. This protects against empty lines in the result.

This method of hulling IP addresses is independent of the correctness of the file, it does not have to be html.

**Arnab Nandy** · Answer 3

Arnab Nandy On 06 June 2020 at 06:20

There's already solutions provided here, I'm rather putting a different one for future readers using egrep utility.

egrep -o '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' file.html

Getting IPs from a .html file

There are 3 best solutions below

Output:

Related Questions in HTML

Related Questions in LINUX

Related Questions in AWK

Related Questions in TEXT-PARSING

Related Questions in STREAMLINE

Trending Questions

Popular # Hahtags

Popular Questions