As part of a project to scrape data from Craigslist, I include image scraping. I've noticed in testing that sometimes the connection is refused. Is there a way around this, or do I need to incorporate error catching for this in my code? I recall the twitter API limits queries, so a sleep timer is incorporated. Curious if I have the same situation with Craigslist. See code and error below.
import requests
from bs4 import BeautifulSoup
#loops through each image and stores it in a local folder
for img in soup_test.select('a.thumb'):
imgcount += 1
filename = (pathname + "/" + motoid + " - "+str(imgcount)+".jpg")
with open(filename, 'wb') as f:
response = requests.get(img['href'])
f.write(response.content)
ConnectionError: HTTPSConnectionPool(host='images.craigslist.org', port=443): Max retries exceeded with url: /00707_fbsCmug4hfR_600x450.jpg (Caused by NewConnectionError(': Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it',))
I have 2 questions about this behavior.
Do CL servers have any rules or protocols such as blocking nth request within a certain time frame?
Is there a way to pause the loop after a connection has been denied? Or do I just incorporate error catching so that it doesn't halt my program?