I currently have my code and script running correctly while being able to get all the data I need scraping. My goal is to have my script run for hours and scrape a single page by using the webdriver to refresh every minute. However, this process only works for the first 15 minutes.
I run this on an was EC2 remote instance by running:
java -jar selenium-server-standalone-3.141.59.jar -port 4444 -sessionTimeout 57868143 &
python3 /home/ec2-user/scraper/football_live.py;
to start the selenium server (which runs longer than 15 minutes) and then the script.
Inside my script I have:
data, n_games = football_data(driver)
insert_data(cur, conn, data)
time.sleep(60)
driver.refresh()
inside a while loop that will run for a long period of time.
This is my webdriver code:
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
time.sleep(5)
driver = webdriver.Remote("http://localhost:4444/wd/hub", options=chrome_options, desired_capabilities=DesiredCapabilities.CHROME)
Here is the only thing I have found that is close to what I am trying to do but it is not all that helpful.
I am also considering just trying to run 15 minute loops in the script as a last resort if there is not a way to extend the duration of the webdriver through selenium.