I am trying to web scrape googlenews with the gnews package. However, I don't know how to do web scraping for older articles like, for example, articles from 2010.
from gnews import GNews
from newspaper import Article
import pandas as pd
import datetime
google_news = GNews(language='es', country='Argentina', period = '7d')
argentina_news = google_news.get_news('protesta clarin')
print(len(argentina_news))
this code works perfectly to get recent articles but I need older articles. I saw https://github.com/ranahaani/GNews#todo and something like the following appears:
google_news = GNews(language='es', country='Argentina', period='7d', start_date='01-01-2015', end_date='01-01-2016', max_results=10, exclude_websites=['yahoo.com', 'cnn.com'],
proxy=proxy)
but when I try star_date I get:
TypeError: __init__() got an unexpected keyword argument 'start_date'
can anyone help to get articles for specific dates. Thank you very mucha guys!
The example code is incorrect for
gnews==0.2.7which is the latest you can install off PyPI viapip(or whatever). The documentation is for the unreleased mainline code that you can get directly off their git source.Confirmed by inspecting the
GNews::__init__method, and the method doesn't have keyword args forstart_dateorend_date:If you want the
start_dateandend_datefunctionality, that was only added rather recently, so you will need to install the module off their git source.Now you can use the start/end functionality:
I get this as a result:
Also note:
periodis ignored if you setstart_dateandend_date(2015, 1, 15). This doesn't seem to work - just be safe and pass adatetimeobject.