I'm parsing a web site with the colly framework and something wrong is happening. I have a very basic function getweeks() to grab and return something, yet I'm getting an empty slice instead.
func getWeeks(c *colly.Collector) []string {
var wks []string
c.OnHTML("div.ltbluediv", func(div *colly.HTMLElement) {
weekName := div.DOM.Find("span").Text() // a string Week 1, Week 2 etc
wks = append(wks, weekName) // weekName has actual value is not empty
// If `wks` printed here it shows correctly how the slice gets populated on each iteration
})
return wks // returns []
}
func main() {
c := colly.NewCollector(
)
w := getWeeks(c)
fmt.Println(w) // []
c.OnRequest(func(r *colly.Request) {
r.Headers.Set("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64)")
})
c.Visit("target url")
}
tl;dr: The slice header is updated inside
OnHTMLcallback, but the value you print inmainis the old slice header. You should work with*[]stringinstead.First of all, the callback you pass to
c.OnHTMLwill actually run only after you callc.Visit, so printingwright aftergetWeeks, would show an empty slice in any case.However it would be an empty slice even by printing it after
c.Visit, why?A slice in Go is implemented as a data structure — called slice header (more info: 1, 2).
When you assign the return value of
getWeeks, you're essentially copying the slice header, including its fieldsData,LenandCap. You can see it in this playground by printing the address of the slices with%pverb (using some other struct instead of go-colly to make the example self-contained):Prints two different memory addresses:
Now if you keep fishing around on Stack Overflow about slice and
appendbehavior, you may find out that if the slice has sufficient capacity (1, 2, 3) the backing array is not reallocated.However even if you do make sure the backing array is the same by initializing
wkswith sufficient capacity, the value ofwis still a copy of the original slice header, therefore with 0 length. This is demonstrated in this playground, which prints:You could adjust the length of
wby reslicing it (playground):But this means that you need to know beforehand a reasonable capacity that doesn't cause reallocation, and the final length to reslice to.
Instead, return a pointer to a slice:
Or pass a pointer into
getWeeks:Fixed playground: https://go.dev/play/p/yhq8YYnkFsv