I'm trying to build a media tracker in python that each day returns all google news articles containing a specific phrase, "Center for Community Alternatives". If, one day, there are no new news articles that exactly contain this phrase, then no new links should be added to the data frame. The problem I am having is that even on days when there are no news articles containing my phrase, my code adds articles that with similar phrases to the data frame. How can I only append links that contain my exact phrase?
Below I have attached an example code looking at 03/01/22:
from GoogleNews import GoogleNews
from newspaper import Article
import pandas as pd
googlenews=GoogleNews(start='03/01/2022',end='03/01/2022')
googlenews.search('"' + "Center for Community Alternatives" + '"')
googlenews.getpage(1)
result=googlenews.result()
df=pd.DataFrame(result)
df
Even though, when you search "Center for Community Alternatives" (with quotes around it) in Google News for this specific date, there are No results found for "center for community alternatives", the code scrapes the links that appear below this, which are Results for center for community alternatives (without quotes).
The API you're using does not support exact match.
In
https://github.com/Iceloof/GoogleNews/blob/master/GoogleNews/__init__.py:As an alternative, you could probably just filter your data frame using an exact match: