I am using Colly for scrapping an ecommerce website. I will loop over many products.
Here is a snippet of my code getting a sub-title
c.OnXML("/html/body/div[4]/div/div[3]/div[2]/div/div[1]/div[3]/div/div/h1/1234", func(e *colly.XMLElement) {
fmt.Println(e.Text)
})
However, not all products have a sub-title so the above XML
path does not work for all cases.
When I reach a product which does not have a sub-title my code got crashed and return an error of
panic: expression must evaluate to a node-set
Here is my so far code:
c := colly.NewCollector()
c.OnError(func(_ *colly.Response, err error) {
log.Println("Something went wrong:", err)
})
//Sub Title
c.OnXML("/html/body/div[4]/div/div[3]/div[2]/div/div[1]/div[3]/div/div/h1/1234", func(e *colly.XMLElement) {
fmt.Println(e.Text)
})
c.OnRequest(func(r *colly.Request) {
fmt.Println("Visiting", r.URL)
})
c.Visit("https://www.lazada.vn/-i1701980654-s7563711492.html")
Here is what I want
c.OnXML("/html/b.....v/h1/1234", func(e *colly.XMLElement) {
if no error {
fmt.Println("NO ERROR)
} else {
fmt.Println("GOT ERROR")
}
})
Maybe I figured out what went wrong in your code. Let me start with the final. As you can see, the error is originated from the
panic
statement at line 473 of theparse.go
file. The packagexpath
has a method calledparseNodeTest
that does the following check:The value of
p.r.typ
isitemNumber
(28
). This leads the switch to enter into the default branch and gives the error. The methods invoked before the above-mentioned one (you can see them in the call stack of your IDE) set thetyp
for the literal1234
to this value and this caused an invalid XPath query. To make it works, you've to get rid of the1234
and put some valid value.Let me know if this solves your issue, thanks!