How to automate getting links from lots of google seaches using python

37 Views Asked by At

I am trying to make a program that searches google for each of the U.S.A county websites and returns the link of the home page for each county website.

Note: Just for context I imported my list of all counties from Wikipedia
Note 2: When I say something doesn't work I mean it may work for some results but not all

I have tried:

  1. using a webscraper on Wikipedia, and federal and state government websites hoping they have the links in there, but they don't.

  2. using python to search for the websites by county and using a webscraper on each google page, but I couldn't find a way to make python automate the use of the webscraper

  3. using Google's custom search API but I ran into an issue with the query limit. I connected a payment method to try to buy more querys but I could not find a way to actually pay for more querys. I also don't know how to properly return the search results so I just assumed that the government website was the first one and this caused problems for some lesser known counties as google would put the Wikipedia of that county first (even through I explicitly searched for the gov website)

my current code using using Google's custom search API

#not everything here I am using but I don't remember as I not sure what half of these do
import pandas as pd
from bs4 import BeautifulSoup
import requests
from googlesearch import search
from googleapiclient.discovery import build
import re

my_api_key = "#######"
my_cse_id = "#######"

#stole this not 100% sure about everything in here but it works
def google_search(search_term, api_key, cse_id, **kwargs):
    service = build("customsearch", "v1", developerKey=api_key)
    res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
    return res

#this is a csv file that stores all the state and county names
counties=pd.read_csv(r'C:file_location')
print(len(counties))

website_links=[]

#goes through all counties in the csv file and searches for their county website on google
for i in range(len(counties)):
    county_name=counties['County or equivalent'][i]
    state_name=counties['State or equivalent'][i]
    result = google_search(f"{county_name} {state_name} county website", my_api_key, my_cse_id)
    print(result["items"][0]['link'], i)
    #this is what returns only the first link result I wish I could do this better
    website_links.append(result["items"][0]['link'])

print(website_links)

I have been looking into google's big query and batch search but I am not sure if these are what I am looking for and I have no idea how to use them (even after looking at the documentation)

If anybody has a solution to any of these problems, tips, or a list of links of all county websites please let me know.

0

There are 0 best solutions below