Instruction Level Parallelism (ILP) Methods

853 Views Asked by At

I'm trying to learn about the methods used in instruction level parallelism and the differences between them. My question here is, given an instruction set that was initially made to run at a processor without instruction level parallelism, which one of these methods can be used in order to achieve instruction level parallelism on a new processor and why/how. The new processor will execute the same instruction set and run the same program binaries identical to the original one, but the performance will be better. The options are:

1)Out-of-order execution(Tomasulo Algorithm)

2)Pipelining

3)Superscalar

4)VLIW

2

There are 2 best solutions below

2
On BEST ANSWER

I would say OOO will be the first thing that will highly increase ILP. OOO architectures are hardware techniques that are totally independent of the workings of compilers (meaning that OOO architecture will carry out the same computations of a CPU without OOO and producing the same results with less time with no change to the instructions structure at all)

Pipe-lining is a well known and old technique to increase ILP but it has its limitations, adding stages increase hardware complexity and eventually will give a diminishing returns.

VLIW and superscalar are essentially the same but they are different style of parallelism, they require special hardware and special compilers, so they are not compatible with the conventional control-flow architecture. This technique essentially rely on compilers to pack more than instruction in one Very Long Instruction Word (VLIW) that can be executed in parallel.

0
On

Start with pipelining. This is the oldest and best approach at achieving ILP through overlapping fetch, decoding, execution, ... of multiple instructions. It is so common that any real CPU which has OOO, in-order, superscalar, VLIW, ... to achieve ILP will also be pipelined.

Yes, OOO will achieve ILP. The first and third instructions below can execute OOO in parallel while the second must wait for the first to complete (RAW hazard on r1). The CPU scheduler will have to find the third instruction OOO dynamically.

ld  r1, 0(r2)
add r2, r1, r3
add r4, r3, r5

You didn't mention in-order but it can achieve ILP as well. The first and second instructions can execute in parallel but the third will have to wait for the first to complete since it also has a RAW hazard on r1.

ld  r1, 0(r2)
add r4, r3, r5
add r2, r1, r3

Superscalar and VLIW only exist for ILP. VLIW uses static compile time scheduling to achieve ILP. Superscalar uses execution time scheduling by the CPU AND compile time scheduling by the compiler to achieve ILP.