I'm sure everyone will groan, and tell me to look at the documentation (which I have) but I just don't understand how to achieve the same as the following:
curl -s http://www.maxmind.com/app/locate_my_ip | awk '/align="center">/{getline;print}'
All I have in python3 so far is:
import urllib.request
f = urllib.request.urlopen('http://www.maxmind.com/app/locate_my_ip')
for lines in f.readlines():
print(lines)
f.close()
Seriously, any suggestions (please don't tell me to read http://docs.python.org/release/3.0.1/library/html.parser.html as I have been learning python for 1 day, and get easily confused) a simple example would be amazing!!!
This is based off of larsmans's answer, above.
Explanation:
for line in f
iterates over the lines in the file-like object, f. Python let's you iterate over lines in a file like you would items in a list.if b'align="center">' in line
looks for the string 'align="center">' in the current line. Theb
indicates that this is a buffer of bytes, rather than a string. It appears thaturllib.reqquest.urlopen
interpets the results as binary data, rather than unicode strings, and an unadorned'align="center">'
would be interpreted as a unicode string. (That was the source of theTypeError
above.)next(f)
takes the next line of the file, because your original awk script printed the line after 'align="center">' rather than the current line. Thedecode
method (strings have methods in Python) takes the binary data and converts it to a printable unicode object. Therstrip()
method strips any trailing whitespace (namely, the newline at the end of each line.