Web scraping by tag on stack overflow

600 Views Asked by At

I would like to do web scraping on this site (stackoverflow.com), I was wondering if there was an API or some other tool that can be used with Python to get all the comments containing a specific tag.

For example, how do I get all the posts and comments from 10/01/2019 to 01/20/2019 with the python tag?

1

There are 1 best solutions below

1
Rounak On BEST ANSWER

Have a detailed look at https://api.stackexchange.com/docs/

You can get all questions from a start date to an end date with a particular tag by making use of the questions method. You need to pass the specific tag into the tagged parameter.

Here is the URL format for that:
https://api.stackexchange.com/2.2/questions?fromdate={start_date}&todate={end_date}&order=desc&sort=activity&tagged={tag}&site=stackoverflow

For example the below link returns all questions from 1st July, 2019 to 5th July, 2019 with tag python:
https://api.stackexchange.com/2.2/questions?fromdate=1561939200&todate=1562284800&order=desc&sort=activity&tagged=python&site=stackoverflow

For more information on how the date has been formatted in the above URL, you can have a look at dates.

Now that you have the question_id, you can make use of questions/{ids}/answers method to get all answers of that question from a start date to an end date.

Here is the URL format for that:
https://api.stackexchange.com/2.2/questions/{question_id}/answers?fromdate={start_date}&todate={end_date}&order=desc&sort=activity&site=stackoverflow

For example the below link returns all answers from 1st January, 2019 to 1st July, 2019 to question with question_id 37181281:
https://api.stackexchange.com/2.2/questions/37181281/answers?fromdate=1546300800&todate=1561939200&order=desc&sort=activity&site=stackoverflow

Now you basically have all the posts(questions and answers) from a start date to an end date with a particular tag.

Since, you have the question_id and answer_id for the posts, you can make use of questions/{ids}/comments method and answers/{ids}/comments method to get the comments on these posts.