I am trying to scrape images from Flickr using the FlickrAPI. What is happening is that the command line just stays there and nothing happens after the image URLs have been scraped. It's something like the following:

Nothing happens after this screen, it stays here for a long time, somewhere in the range of 1200 seconds or more sometimes.
For scraping I used the following code:
def get_urls(search='honeybees on flowers', n=10, download=False):
t = time.time()
flickr = FlickrAPI(key, secret)
license = () # https://www.flickr.com/services/api/explore/?method=flickr.photos.licenses.getInfo
photos = flickr.walk(text=search, # http://www.flickr.com/services/api/flickr.photos.search.html
extras='url_o',
per_page=500, # 1-500
license=license,
sort='relevance')
if download:
dir = os.getcwd() + os.sep + 'images' + os.sep + search.replace(' ', '_') + os.sep # save directory
if not os.path.exists(dir):
os.makedirs(dir)
urls = []
for i, photo in enumerate(photos):
if i < n:
try:
# construct url https://www.flickr.com/services/api/misc.urls.html
url = photo.get('url_o') # original size
if url is None:
url = 'https://farm%s.staticflickr.com/%s/%s_%s_b.jpg' % \
(photo.get('farm'), photo.get('server'), photo.get('id'), photo.get('secret')) # large size
download
if download:
download_uri(url, dir)
urls.append(url)
print('%g/%g %s' % (i, n, url))
except:
print('%g/%g error...' % (i, n))
# import pandas as pd
# urls = pd.Series(urls)
# urls.to_csv(search + "_urls.csv")
print('Done. (%.1fs)' % (time.time() - t) + ('\nAll images saved to %s' % dir if download else ''))
This function is called as follows:
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--search', type=str, default='honeybees on flowers', help='flickr search term')
parser.add_argument('--n', type=int, default=10, help='number of images')
parser.add_argument('--download', action='store_true', help='download images')
opt = parser.parse_args()
get_urls(search=opt.search, # search term
n=opt.n, # max number of images
download=opt.download) # download images
I tried going through the function code multiple times but I can't seem to understand why nothing happens after the scraping is done, as everything else is working fine.
I can't run it but I think all problem is that it gets information about 500 photos - because you have
per_page=500- and it runsfor-loop for all 500 photos and you have to wait for the end offor-loop.You should use
breakto exit this loop afternimagesOr simply you should use
photos[:n]and then you don't have to checki < nEventually you should use
per_page=nBTW:
You can use
os.path.jointo create pathIf you use
exist_ok=Trueinmakedirs()then you don't have to checkif not os.path.exists(dir):If you use
enumerate(photos, 1)then you get values1,2,3,...instead of0,1,2,...