Different performances in Go slices resize

2.8k Views Asked by At

I'm spending some time experimenting with Go's internals and I ended up writing my own implementation of a stack using slices. As correctly pointed out by a reddit user in this post and as outlined by another user in this SO answer Go already tries to optimise slices resize.

Turns out, though, that I rather have performance gains using my own implementation of slice growing rather than sticking with the default one.

This is the structure I use for holding the stack:

type Stack struct {
    slice []interface{}
    blockSize int
}

const s_DefaultAllocBlockSize = 20;

This is my own implementation of the Push method

func (s *Stack) Push(elem interface{}) {
    if len(s.slice) + 1 == cap(s.slice) {
        slice := make([]interface{}, 0, len(s.slice) + s.blockSize)
        copy(slice, s.slice)
        s.slice = slice
    }

    s.slice = append(s.slice, elem)
}

This is a plain implementation

func (s *Stack) Push(elem interface{}) {
    s.slice = append(s.slice, elem)
}

Running the benchmarks I've implemented using Go's testing package my own implementation performs this way:

Benchmark_PushDefaultStack  20000000            87.7 ns/op        24 B/op          1 allocs/op

While relying on the plain append the results are the following

Benchmark_PushDefaultStack  10000000           209 ns/op          90 B/op          1 allocs/op

The machine I run tests on is an early 2011 Mac Book Pro, 2.3 GHz Intel Core i5 with 8GB of RAM 1333MHz DDR3

EDIT The actual question is: is my implementation really faster than the default append behavior? Or am I not taking something into account?

2

There are 2 best solutions below

2
On BEST ANSWER

Reading your code, tests, benchmarks, and results it's easy to see that they are flawed. A full code review is beyond the scope of StackOverflow.

One specific bug.

// Push pushes a new element to the stack
func (s *Stack) Push(elem interface{}) {
    if len(s.slice)+1 == cap(s.slice) {
        slice := make([]interface{}, 0, len(s.slice)+s.blockSize)
        copy(slice, s.slice)
        s.slice = slice
    }

    s.slice = append(s.slice, elem)
}

Should be

// Push pushes a new element to the stack
func (s *Stack) Push(elem interface{}) {
    if len(s.slice)+1 == cap(s.slice) {
        slice := make([]interface{}, len(s.slice), len(s.slice)+s.blockSize)
        copy(slice, s.slice)
        s.slice = slice
    }

    s.slice = append(s.slice, elem)
}

copying slices

The function copy copies slice elements from a source src to a destination dst and returns the number of elements copied. The number of elements copied is the minimum of len(src) and len(dst).

You copied 0, you should have copied len(s.slice).

As expected, your Push algorithm is inordinately slow:

append:

Benchmark_PushDefaultStack-4     2000000           941 ns/op          49 B/op          1 allocs/op

alediaferia:

Benchmark_PushDefaultStack-4      100000       1246315 ns/op       42355 B/op          1 allocs/op

This how append works: append complexity.

There are other things wrong too. Your benchmark results are often not valid.

3
On

I believe your example is faster because you have a fairly small data set and are allocating with an initial capacity of 0. In your version of append you preempt a large amount of allocations by growing the block size more dramatically early (by 20) circumventing the (in this case) expensive reallocs that take you through all those trivially small capacities 0,1,2,4,8,16,32,64 ect

If your data sets were a lot larger this would likely be marginalized by the cost of large copies. I've seen a lot of misuse of slice in Go. The clear performance win is had by making your slice with a reasonable default capacity.