I am trying to extract Google Scholar public profiles of certain professors.
I have a list of professors' names and I am using it with help of a scholarly
package for scraping their public profile information. However, I am stuck with an error. I am only able to retrieve information for the first name in the professor_list
and not the subsequent ones.
for name in professor_list:
search_query = scholarly.search_author(name)
scholarly.pprint(next(search_query))
Output:
{'affiliation': 'Deakin University',
'citedby': 2528,
'email_domain': '@deakin.edu.au',
'filled': False,
'interests': ['Lynn Batten'],
'name': 'Lynn Batten',
'scholar_id': 'Tmg0T9sAAAAJ',
'source': 'SEARCH_AUTHOR_SNIPPETS',
'url_picture': 'https://scholar.google.com/citations?view_op=medium_photo&user=Tmg0T9sAAAAJ'}
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-242-5b96571c0972> in <module>
1 for name in professor_list:
2 search_query = scholarly.search_author(name)
----> 3 scholarly.pprint(next(search_query))
StopIteration:
Although,
scholarly.pprint(next(search_query))
should be working, you can add default valueNone
fornext()
method in case nothing is found, e.g.next(search_query, None)
:More information about
StopIteration
by Martijn Pieters.Full output:
Alternatively, you can iterate one more time over
scholarly.search_author()
results to make it work:Full output:
Another alternative is to use Google Scholar Profiles API from SerpApi. It's a paid API with a free plan that handles scaling, bypasses blocks from search engines via dedicated proxies and CAPTCHA solving services. Check out the playground.
Example code to integrate:
Full output: