Why must use general barrier to guarantees transitivity of cpu?

118 Views Asked by At

I recently read transitivity of cpu in memory-barriers and the author emphasize only general barrier can guarantee transitivity. But, I can't understand it very well.For example:

CPU 1                      CPU 2                      CPU 3
=======================    =======================    =======================
{ X = 0, Y = 0 }
STORE X=1                  LOAD X                     STORE Y=1
                           <read barrier>             <general barrier>
                           LOAD Y                     LOAD X

Suppose X in cache of CPU3,and status is modified;Y in cache of CPU2, and status is also modified.

CPU1 shares it's store buffer with CPU2, if we add write barrier before read barrier. (it become a general barrier)

1) CPU1 sets value of X(X=1) in store buffer.

2) CPU2 reads value of X from store buffer(shared store buffer).

3) CPU2 marks X in store buffer (write barrier),and read invalidate queue to ensure no invalidate messages from CPU3(read barrier).

4) CPU2 wants change cache line of X from invalid to modified,so sends invalidate messages to CPU3.

5) CPU3 receives invalidate messages of X,put it in invalidate queue and respond it to CPU2.

6) CPU2 receives respond,then,write X = 1 to memory or cache, and load Y == 0.

...

7) CPU3 will find that it has invalidate message of X in it’s invalidated queue when it execute general barrier,after that, X must be equal 1.

That’s all right,I can understand.However, I read another example from figure 14.3 of perbook ,as below:

thread0(void) {
    A = 1;
    smp_wb();
    B = 1;
}
thread1(void) {
    while (B == 0)
        continue;
    barrier();
    C = 1;
}
thread2(void) {
    while (C == 0)
        continue;
    barrier();
    assert(A == 1);
}

There are some opportunities to fire assert. The author said that change all barrier to smp_mb can fix it in answer of Quick Quiz 14.2.

So,my question is why we need change barrier in thread1 to smp_mb?If thread0 and thread1 runs on CPU0 and CPU1,and them shared a store buffer. Their store buffer will like bleow after thread1 execute Store C = 1.

[A(wb), B, C]

Because thread2(runs on CPU2) also use smp_mb instead of barrier, So it guarantees that A must be 1 if it see C == 1.

I describe all of above in MESI memory coherency protocol.Maybe author means there are another protocols make barrier in thread1 must be instead of smp_mb to guarantees transitivity of cpu?

Can anybody give me a example please?

Maybe it's a mistake think about transitivity in specific protocol. What we must remember is that rmb() or wmb() can't guarantees transitivity of cpu because there are so many different protocols and architectures.

0

There are 0 best solutions below