Skip Connection Interruptions (Site & BeautifulSoup)

2.2k Views Asked by At

I'm currently doing this with my script:

Get the body (from sourcecode) and search for a string, it does it until the string is found. (If the site updates.)

Altough, if the connection is lost, the script stops.

My 'connection' code looks something like this (This keeps repeating in a while loop every 20 seconds):

opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]

url = ('url')
openUrl = opener.open(url).read()

soup = BeautifulSoup(openUrl)

I've used urllib2 & BeautifulSoup.

Can anyone tell me how I could tell the script to "freeze" if the connection is lost and look to see if the internet connection is alive? Then continue based on the answer.(So, to check if the script CAN connect, not to see if the site is up. If it does checkings this way, the script will stop with a bunch of errors.)

Thank you!

2

There are 2 best solutions below

3
On

Rather than "freeze" the script, I would have the script continue to run only if the connection is alive. If it's alive, run your code. If it's not alive, either attempt to reconnect, or halt execution.

while keepRunning:
   if connectionIsAlive():
      run_your_code()
   else:
      reconnect_maybe()

One way to check whether the connection is alive is described here Checking if a website is up via Python

If your program "stops with a bunch of errors" then that is likely because you're not properly handling the situation where you're unable to connect to the site (for various reasons such as you not having internet, their website is down, etc.).

You need to use a try/except block to make sure that you catch any errors that occur because you were unable to open a live connection.

try:
   openUrl = opener.open(url).read()
except urllib2.URLError:
   # something went wrong, how to respond?
3
On

Found the solution!

So, I need to check the connection every LOOP, before actually doing stuff.

So I created this function:

def check_internet(self):
    try:
        header = {"pragma" : "no-cache"}
        req = urllib2.Request("http://www.google.ro", headers=header)
        response = urllib2.urlopen(req,timeout=2)
        return True
    except urllib2.URLError as err:
        return False

And it works, tested it with my connection down & up!

For the other newbies wodering:

while True:
     conn = check_internet('Site or just Google, just checking for connection.')
     try:
         if conn is True:
         #code
         else:
         #need to make it wait and re-do the while.
         time.sleep(30)
     except: urllib2.URLError as err:
         #need to wait
         time.sleep(20)

Works perfectly, the script has been running for about 10 hours now and it handles errors perfectly! It also works with my connection off and shows proper messages.

Open to suggestions for optimization!