I am using the App Store Scraper to get podcast reviews from the Apple Store. One thing students and I realised is that, naturally, popular international podcasts get reviews from several countries, and when we want to catch them all, we need to loop through the country codes. As even people in Sweden or Poland may comment on a BBC podcast in English, we did not want to exclude any countries but use the whole set, which I have (as a starting point) hard-coded as follows:
# Select country codes
# full list of countries where Apple podcasts are available has been shared on Gitlab
countries=["DZ", "AO", "AI",
"AR", "AM", "AU",
"AT", "AZ", "BH",
"BB", "BY", "BE",
"BZ", "BM", "BO",
"BW", "BR", "VG",
"BN", "BG", "CA",
"KY", "CL", "CN",
"CO", "CR", "HR",
"CY", "CZ", "DK",
"DM", "EC", "EG",
"SV", "EE", "FI",
"FR", "DE", "GH",
"GB", "GR", "GD",
"GT", "GY", "HN",
"HK", "HU", "IS",
"IN", "ID", "IE",
"IL", "IT", "JM",
"JP", "JO", "KE",
"KW", "LV", "LB",
"LT", "LU", "MO",
"MG", "MY", "ML",
"MT", "MU", "MX",
"MS", "NP", "NL",
"NZ", "NI", "NE",
"NG", "NO", "OM",
"PK", "PA", "PY",
"PE", "PH", "PL",
"PT", "QA", "MK",
"RO", "RU", "SA",
"SN", "SG", "SK",
"SI", "ZA", "KR",
"ES", "LK", "SR",
"SE", "CH", "TW",
"TZ", "TH", "TN",
"TR", "UG", "UA",
"AE", "US", "UY",
"UZ", "VE", "VN",
"YE"]
Then I loop through this list to get the reviews, which works OK -- but it the process is very slow! Whenever a podcast does not have any reviews, the app store scraper (according to the notifications I get) tries the request 20 times before moving on to the next item, so the loop takes ages. How can I make the process faster, e.g. forcing the script to move on if the first request is unsuccessful? This is what I have so far:
# Set podcast details
app_id = 1614435903
app_name = '28ish-days-later'
# important: country codes will be selected from the list above
# Set output path
path_out = "podcast_reviews"
filename_csv = f'{app_name}_reviews_table.csv'
file_csv = directory + path_out + filename_csv
# Optional: use (how_many=n) after sys.review to limit output
# otherwise all reviews are fetched
for c in countries:
# Create class object
sysk = Podcast(country=c, app_name=app_name, app_id=app_id)
sysk.review()
print(f"No. of reviews found for country {c}:")
#pprint(sysk.reviews)
pprint(sysk.reviews_count)
# NOTE: the review count seen on the landing page differs from the actual number of reviews fetched.
# This is simply because only some users who rated the app also leave reviews.
The notification I get in the output when no reviews are found is this:
ERROR:Base:Something went wrong: HTTPSConnectionPool(host='amp-api.podcasts.apple.com', port=443): Max retries exceeded with url: /v1/catalog/dz/podcasts/1614435903/reviews?l=en-GB&offset=0&limit=20 (Caused by ResponseError('too many 404 error responses'))
No. of reviews found for country DZ:
0
My first attempt was to include a try
and except
, but that does not stop the script from attempting the max retries before raising the error, so I got rid of it. Perhaps it is possible to give the script a "how_many=1" limitation for all country codes and write only the ones that retrieve a result to a new list before starting the loop. I will post this as an answer if it works.
Based on the discussion with @buran above, here is my solution for checking the existence of reviews via
requests
first before feeding a much shorter country code list into theapp store scraper
:This gives me the following output in a short amount of time: