How many Goroutines is too many Goroutines?

77 Views Asked by At

Suppose I have a workload that branches out like a tree.

I have to process n A items.
Processing each A item requires processing of m B items.
This goes on for another level or two.

And I have the following functions:

func handler() {
    var aList []A

    var wg sync.WaitGroup
    for _, a := range aList {
        wg.Add(1)
        go func () {
            defer wg.Done()
            processA(a)
        }
    }
    wg.Wait()
}


func processA(a A) error {
    var wg sync.WaitGroup
    for _, b := range a.BList {
        wg.Add(1)
        go func () {
            defer wg.Done()
            processB(b)
        }
    }
    wg.Wait()
} 


func processB(b B) error {
    var wg sync.WaitGroup
    for _, c := range b.CList {
        wg.Add(1)
        go func () {
            defer wg.Done()
            processC(c)
        }
    }
    wg.Wait()
} 

Now the nature of all of theses tasks is that they're BSP (Bulk Synchronous Processses).
By that I mean, that they need no communication amongst themselves.
And more importantly, if I had an infinite amount of Cores, there would be NO waiting in any thread/goroutine.

Now let's come back to Earth, I am running this on a Lambda function that will have 2/3/4 Cores to offer.
My workload is still such that no memory limits will be hit.

Now should I change my code to limit the number of goroutines?
If what I want is speedup?
Is my code lagging due to too much context switching?

Or is there no such thing as "too many goroutines"?

2

There are 2 best solutions below

0
coxley On

The answer is unsatisfying: It depends.

You need to incorporate profiling into your development cycle to understand the best way forward — CPU and trace profiles work well for this. We can't easily predict your workload.

And more importantly, if I had an infinite amount of Cores, there would be NO waiting in any thread/goroutine.

If you're saying the work is entirely CPU-bound, having a number of goroutines equal to CPUs will likely perform best. Then your outer-logic sends work to each worker with a channel.

Trace profiles will show how efficiently work is being scheduled for each CPU. CPU profiling could highlight where the hot-spots are.

There's a talk by Dave Cheney where he shows three profiling techniques, including one where too many goroutines slowed down a CPU-bound program. It's worth a watch: https://www.youtube.com/watch?v=nok0aYiGiYA

0
Adrian On

There isn't technically any such thing as "too many goroutines" until you run out of memory, but I don't think that's quite what you're asking. There's not one quick answer to this in practice, but there are some useful pieces of information:

  • Goroutines are light, but not free. There is context switching, though the cost is less than with threads.
  • Goroutines also consume memory; again, not much, but each has a minimum stack around 4kb so this might help guide what size you want your running goroutines to be.
  • Context switching won't occur at random. While the scheduler will try to avoid starvation, it won't just start flipping as fast as possible because there are a lot of goroutines. It will let some sleep so others can finish.
  • The scheduler already limits to 1 thread per core by default, and schedules goroutines onto those threads. From this you can infer that a healthy number of goroutines would exceed the number of cores.
  • Your goroutines are what spawn goroutines, which means the creation of new goroutines is self-limiting; the goroutine that spawns goroutines has to get enough CPU to keep spawning them. This generally tends to let the "leaf" functions (that don't spawn more routines) wrap up.

In general, I would strongly recommend:

  • Only worrying about it if you measure a performance problem.
  • When you do measure a performance problem, using tracing and profiling to locate the source of the problem.
  • Only worry about optimizing the identified source of a measured performance problem.