I'm parsing a web site with the colly framework and something wrong is happening. I have a very basic function getweeks()
to grab and return something, yet I'm getting an empty slice instead.
func getWeeks(c *colly.Collector) []string {
var wks []string
c.OnHTML("div.ltbluediv", func(div *colly.HTMLElement) {
weekName := div.DOM.Find("span").Text() // a string Week 1, Week 2 etc
wks = append(wks, weekName) // weekName has actual value is not empty
// If `wks` printed here it shows correctly how the slice gets populated on each iteration
})
return wks // returns []
}
func main() {
c := colly.NewCollector(
)
w := getWeeks(c)
fmt.Println(w) // []
c.OnRequest(func(r *colly.Request) {
r.Headers.Set("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64)")
})
c.Visit("target url")
}
tl;dr: The slice header is updated inside
OnHTML
callback, but the value you print inmain
is the old slice header. You should work with*[]string
instead.First of all, the callback you pass to
c.OnHTML
will actually run only after you callc.Visit
, so printingw
right aftergetWeeks
, would show an empty slice in any case.However it would be an empty slice even by printing it after
c.Visit
, why?A slice in Go is implemented as a data structure — called slice header (more info: 1, 2).
When you assign the return value of
getWeeks
, you're essentially copying the slice header, including its fieldsData
,Len
andCap
. You can see it in this playground by printing the address of the slices with%p
verb (using some other struct instead of go-colly to make the example self-contained):Prints two different memory addresses:
Now if you keep fishing around on Stack Overflow about slice and
append
behavior, you may find out that if the slice has sufficient capacity (1, 2, 3) the backing array is not reallocated.However even if you do make sure the backing array is the same by initializing
wks
with sufficient capacity, the value ofw
is still a copy of the original slice header, therefore with 0 length. This is demonstrated in this playground, which prints:You could adjust the length of
w
by reslicing it (playground):But this means that you need to know beforehand a reasonable capacity that doesn't cause reallocation, and the final length to reslice to.
Instead, return a pointer to a slice:
Or pass a pointer into
getWeeks
:Fixed playground: https://go.dev/play/p/yhq8YYnkFsv