I want to scrape data from the below public website using Golang gocolly/colly -
https://eds.ospi.k12.wa.us/BusDepreciation/default.aspx?pageName=busSearch
For the above website, I want to select all the "School District" options available in the dropdown one by one and scrape all the data. So far I am able to scrape only the HTML of the page but I am not able to find any way to select the dropdown options to get the data for different options.
My Go code
package main
import (
"fmt"
"github.com/gocolly/colly/v2"
)
func main() {
// Instantiate default collector
c := colly.NewCollector()
c.OnHTML("tbody tr", func(e *colly.HTMLElement) {
fmt.Printf("BODY----%+v\n", e)
})
c.Visit("https://eds.ospi.k12.wa.us/BusDepreciation/default.aspx?pageName=busSearch")
}
I would appreciate it if anyone could refer me to the related document. Also, if it is not possible with gocolly/colly then please suggest to me another option in Golang or Python for selecting the dropdown options.
I also want to know if we should use Selenium for scraping large data as in our scenario as an alternate approach? if yes how can we do it in Golang or Python? or should we use scrapy?
I was able to achieve what you're struggling with through the following code:
The relevant changes can be summarized in the following list:
OnRequest
callback action, I set up theUser-Agent
header. It's helpful to let you scrap and crawl websites with some restrictionsOnHTML
, I selected the node based on theid
"Content_ctl00_organizationDropDowns_lstDistrict". Then, I used theDOM
method to get the DOM object. With theChildren
method you can get all of the children nodes which are the options you're concerned with.Data
field of the nodes.Undoubtedly, the code can still be improved but it should be a good starting point to scrape what you need. Let me know if this solves your issue!