I've had a lot of experiences with Scrapy but or some reasons in this project I should use colly. I'm trying to scrape data from a website but it returns To regain access, please make sure that cookies and JavaScript are enabled before reloading the page.
the part of my codes as follow:
func crawl(search savedSearch) {
c := colly.NewCollector()
extensions.RandomUserAgent(c)
/* for debugging to see what is the result
c.OnHTML("*", func(e *colly.HTMLElement) {
fmt.Println(e.Text)
os.Exit(1)
})*/
c.OnHTML(".result-list__listing", func(e *colly.HTMLElement) {
listingId, _ := strconv.Atoi(e.Attr("data-id"))
if !listingExist(search.id, listingId) {
fmt.Println("Listing found " + strconv.Itoa(listingId))
saveListing(search.id, listingId)
notifyUser(search.user, listingId)
}else{
fmt.Println("item is already crawled")
}
})
I see in the doc "Automatic cookie and session handling" so it might be the problem is js, how can I overcome this problem? first, try could be how can I enable js in colly?
Colly is the best choice for HTML pages. If you need to scrape JS-driven pages, you will need to use a different strategy. Browsers have a mutual protocol to work on JS and they have different libraries for different language including Go.