How is rebalance_domains() synchronized below NUMA level?

32 Views Asked by At

What I am trying to figure out is: Assume we have no idle CPUs. Which CPU will be responsible for balancing at each domain? In other words, what will should_we_balance return?

For a (very) long time, I'd just assumed that the first CPU of the first group is the one responsible for balancing any given domain. This way, we can avoid races in which two CPUs compete with each other over stealing from the busiest group.

However, this understanding produces several inconsistencies (which I will not list for brevity), so now I am suspecting that the "first group" of a sched_domain from the perspective of one CPU is not necessarily the "first group" from the perspective of another. For example, suppose we have a two-level topology with

{0, 1, 2, 3}     Node
{0, 1}  {2, 3}   SMP
0   1    2   3   CPUs

If all CPUs are busy, then the CPUs responsible for balancing the top-level domain are 0 and 2, not 0 alone, because from the perspective of 2 and 3, {2,3} is the first group of the top-level domain.

Is that correct? It sure clears the inconsistencies from the other understanding, but it introduces others!

  • The comments in the kernel mention several times (e.g., here) that sched_domains are per cpu. So I assume that any changes to sd->last_balance is only visible to the local cpu. But then why need an rcu lock when traversing the domain topology in rebalance_domains? But this is not the main problem, the next point is.
  • As far as I gather from the commit which introduced SD_SERIALIZE (link), the logic behind only serializing at NUMA level is that only at NUMA level does the balancing take too much time, whereas at the lower levels -it is claimed- the balancing is so fast that the probability of race conditions is too low... which is unbelievable to me! If 0 and 2 cannot see each other's modifications to sd->last_balance, it is more than likely for them to synchronize their balance intervals. How will that not pose problems?

In short, I want to understand which CPU(s) is/are responsible for balancing if all are busy, as well as the frequency with which they balance. Do they affect each others' frequencies (i.e., sd->last_balance) or are they independent?

0

There are 0 best solutions below