Pipeline and branch instructions

1.2k Views Asked by At

Lets suppose that 20 percent of the instructions in a program are branch instructions.The static prediction of the jumps supposes that the jumps don't happen.

I should find the execution time in two cases : When 30 percent of the branches happen and when 70 percent of the branches happen I also should find the speedup of one case compared to the other and express it in percentage.

Thing is,how do I find the execution time here ? I usually find the execution time where the pipeline is separated in different phases and there is given the time for each phase ....

Edit : This is NOT homework.I found this in my computer architecture textbook and its not a familiar exercise.

1

There are 1 best solutions below

0
On

This question sounds like homework but the matter is worth some discussion.

We assume to have a static branch predictor that always predicts NOT TAKEN. This was the type of branch predictor of early SPARC and MIPS implementations. Such a branch predictor always fetches the next sequential instruction in the program.

Let me also assume that we have a simplified 4 stage pipeline made of Fetch (F), Decode (D), Execute (E) and Write Back (W). Consider the following simplified assembly program:

        ...
0xF1:   JUMP <condition>, 0xF4
0xF2:   ADD r1, r2, r3
0xF3:   ADD r3, r4, r1
0xF4:   ADD r1, r2, r3

When a branch is correctly predicted the pipeline behaves normally. The question is what happens to the pipeline when a branch is mis-predicted. Which in our case corresponds to the case when the condition of the JUMP instruction (0xF1) is verified.

0xF1:  F  D  E  W
0xF2:     F  D  X
0xF3:        F  X  
0xF4:           F

cycle  1  2  3  4

In the Execute stage of the JUMP instruction we evaluate the condition and detect that the branch has to be taken. Due to the branch predictor policy, however, we already fetched instructions 0xF2 and 0xF3 and decoded 0xF2. The pipeline is flushed and at the next clock cycle the branch target is correctly fetched. As you can see from the pipeline we wasted 2 clock cycles fetching and decoding instructions that will not be executed. This 2 clock cycles are known as branch penalties and you must take them into account when calculating the program's execution time.

The world of branch predictors is much more complex in reality. More elaborated static branch predictors exist that, for instance, always predict as TAKEN a forward jump and as NOT TAKEN backward ones. To reduce the branch penalty cycles processors often employ a Branch Target Buffer (BTB) that is a small cache that stores the target of recently executed JUMP instructions. Without a BTB, to predict a branch as TAKEN we have to wait until the Decode stage, where the instruction is identified as a JUMP and the target address is decoded. In the meantime we have fetched an instruction that will then be flushed. With a BTB, on the other hand, we can do branch prediction in the Fetch stage: if the Program Counter is in the BTB we know 2 that

  1. The fetched instruction is a branch
  2. We have its target address

So if can predict the branch and if predicted as TAKEN we can fetch its target without any penalty. Modern processors also adopt dynamic branch predictors that use complex policies as well as some additional buffers to avoid mis-predictions.