I have made a scraper and I would like to make the function "page_link = """ scan each URL that are saved in a JSON,XML or SQL file.
Could someone point me in the direction so I can learn how to make this dynamic instead of static?
You don't have to give me the answer, just point me towards where I can learn more about what I should do. I'm still learning.
from bs4 import BeautifulSoup
import requests
print('step 1')
#get url
page_link = "<random website with info>"
print('step 2')
#open page
page_response = requests.get(page_link, timeout=1)
print('step 3')
#parse page
page_content = BeautifulSoup(page_response.content, "html.parser")
print('step 4')
#naam van de pagina
naam = page_content.find_all(class_='<random class>')[0].decode_contents()
print('step 5')
#printen
print(naam)
JSON seems like the right tool for the job. XML and SQL are a bit heavy-handed for the simple functionality that you need. Furthermore, Python has built-in json reading/writing functionality (json is similar enough to a Python
dictin a lot of respects).Just maintain a list of sites you want to hit in a json file similar to this one (put it in a file called
test.json):Then do your scraping for each of these sites:
this outputs (if you remove the
...):Just put the rest of the logic you want to use to do the scraping (like you have above in the question) in under
# do scrapingcomment.