I'm learning web scraping using gocolly. When I try to find the tag using selector name body
, it successfully finds it. However, when I try to find the body tag by xpath /html/body
, it fails to find it.
I have used OnHTML() with a simple callback function:
collector.OnHTML("/html/body", func(element *colly.HTMLElement) {
fmt.Println("Found Body")
})
Any idea as to why is this happening?
Also, when looking at tutorials, I noticed that the selector passed into the function OnHTML() is sometimes wrapped by ""(double quotes) and sometimes by ``(back-ticks). Is there a difference between the two?
How do I search for a ID element because when I'm trying to search for the ID #layout-container under the body, Colly is not finding it:
collector.OnHTML("#layout-container", func(element *colly.HTMLElement) {
fmt.Println("Found Layout Container")
})
Thanks in advance!
From an HTML perspective, the
/html
part is already implied when using OnHTML.You would use
/html/body
, as shown incolly_test.go
, with OnXML() (Function will be executed on every XML element matched by the xpath Query parameter)The test using OnHTML shows only "
body
".