How can we expect a program to complete in order?

107 Views Asked by At

Out of order execution exists on most modern microprocessors. How can we expect a program to complete in order as it was written by a programmer?

1

There are 1 best solutions below

0
On

Right, of course you can't just run instructions in arbitrary order, on whatever register values happen to be present in architectural registers. An instruction can only execute when its correct inputs are ready. Finding when that's true for as many "future" instructions as possible is key to keeping execution units fed with work, finding instruction level parallelism (ILP).

Dependency tracking between instructions is set up during allocate/rename as instructions are copied (in program order) from the front-end to the back end. During alloc/rename, the RAT (Register Allocation Table) maps architectural registers to physical registers, allowing multiple independent dep chains using the same architectural register to be executing at once.


Maybe you're thinking about this backwards. An ISA (such as x86-64 or ARMv8) defines a set of rules for how machine instructions execute, often a serial model where one instruction fully finishes executing before the next one starts. (ISAs with some explicit parallelism do exist, e.g. VLIW, or The Mill's delayed-visibility loads that become visible some fixed number of instructions later, helping hide load latency on in-order CPUs. There are still well-defined rules.)

To run software for an ISA like that, hardware has to implement all the rules that are guaranteed on paper. The cardinal rule of out-of-order execution is to preserve the illusion of executing instructions in program order for a single thread1. This is a necessary part of being a valid / non-buggy x86 or ARM CPU.

(See Observing x86 register dependencies - any architectural state that isn't renamed generally needs to serialize the pipeline on modification, so make sure any instructions that need the old value have already finished.)

This is very much like the C "as if" optimization rule: you can do whatever you want as long as the observable result is still the same, but for CPU architects instead of compilers.

As Modern Microprocessors A 90-Minute Guide! puts it: If the processor is going to execute instructions out of order, it will need to keep in mind the dependencies between those instructions. (You should definitely go read that, if you haven't already; it's a very good baseline of understanding for any more advanced stuff, and it covers a lot of ground itself. Highly recommended.)

That's why OoO exec needs so many transistors for a large scheduler and reorder buffer (ROB) to keep track of the reordering, and of dependencies between instructions. (In computer-architecture terminology, RAW hazards: read after write, where one instruction writes a register and a later instruction reads that register. Register renaming hides WAR and WAW hazards.)

Footnote 1: OoO exec does not try to maintain order of memory accesses observed by other threads. Even in-order CPUs can do memory reordering, so software always needs to take care for inter-thread communication, using fence instructions and/or acquire loads / release stores or whatever, depending on the ISA.

See also