I've managed to wrangle this code together to turn a JSON file returned from a Google custom search engine into a .csv file. It works. The issue is that Google only returns ten results per page.
import csv
import json
import requests
url = ('https://www.googleapis.com/customsearch/v1?key=APIKEY&cx=SEARCGENGINEID&q=QUERY&sort=date:r:20231001:20231102')
response = requests.get(url)
response_dict = response.json()
file = open("Testing.csv", "w", newline='')
header = ['Site', 'Title', 'URL', 'Author', 'Date']
writer = csv.writer(file)
writer.writerow(header)
out_rows = 0
for item in response_dict.get('items', []):
title = item.get('title', '')
url = item.get('link', '')
author = item.get('pagemap', {}).get('metatags', [{}])[0].get('author', '')
date = item.get('pagemap', {}).get('metatags', [{}])[0].get('article:published_time', '')
site = item.get('pagemap', {}).get('metatags',[{}])[0].get('og:site_name','')
data = [site, title, url, author, date]
writer.writerow(data)
out_rows += 1
file.close()
print(f"OK! Published file with {out_rows} articles.")
with open("Testing.json", "w") as json_file:
json.dump(response_dict, json_file, indent=2)
You can get to the next page of results by adding &start=11, &start=21 etc. to the url for the API request. What I would like to do, but don't know how to structure it, is make the computer do that for me with another loop. If out_rows=10, go through the process again and add the additional results to the same sheet until out_rows<10. What would be the easiest way to do this?