I was thinking of trying out BigQuery and GithubArchive, but I'm not sure how to compose a query that would let me search for a term in code or project and order the results by number of commits descending.
Thanks for any tips
I was thinking of trying out BigQuery and GithubArchive, but I'm not sure how to compose a query that would let me search for a term in code or project and order the results by number of commits descending.
Thanks for any tips
SELECT COUNT(1) c, repository_url, repository_description
FROM [githubarchive:github.timeline]
WHERE type = 'PushEvent'
AND REGEXP_MATCH(repository_description, r'(?i)SQL')
GROUP BY 2, 3
ORDER BY c DESC
LIMIT 10
BigQuery supports Regular Expression so you can greatly improve / narrow down your search result having flexibility of using search pattern vs. seach term
Below references can help you further:
The GithubArchive data loaded into BigQuery doesn't have copy of the source code, so search term in code wouldn't be possible. But if you wanted to search for a term in repository description, and then pick top repositories by number of commits, here is an example how to do it (the term is "SQL" in this example):
This results in