Out of order execution exists on most modern microprocessors. How can we expect a program to complete in order as it was written by a programmer?
How can we expect a program to complete in order?
139 Views Asked by European Academy At
1
There are 1 best solutions below
Related Questions in CPU
- 1MiB = 1024KiB = 2^10. Nonetheless, why not use just 1000 byte instead 1024 to calculate size?
- What is the simplest Turing complete CPU instruction set which can execute code from ROM?
- How to get CPU utilization in % in terminal (mac)
- Avoiding CPU Contention
- Lots of cache miss, Sparse matrix multiplication
- CPU new features enabled in Linux kernel
- Are correct branch predictions free?
- NUMA support on which CPU? What are the current server configuration of this kind of CPU?
- How to deal with virtual address when trying to get memory access pattern statistics?
- On x86, does enabling paging cause an "unconditional jump" (since EIP is now a virtual address)?
- cpu load when setting textbox value
- CPU usage exceeding 100% in top command third line
- 32bit cpu: how much memory can it use?
- CMOS Scaling vs Die Shrink
- Meaning of cores and logical processors in intel icore
Related Questions in CPU-ARCHITECTURE
- Real-world analog to TIS-100
- What is faster: equal check or sign check
- Multicore clock counter consistency
- How do MemReq and MemResp exactly work in RoccIO - RISCV
- What is the simplest Turing complete CPU instruction set which can execute code from ROM?
- Had 16-bit DOS a memory access limitation of 1 MB? If yes, how?
- Are correct branch predictions free?
- Assembly: why some x86 opcodes are invalid in x64?
- Memory barriers force cache coherency?
- FreeRTOS : How to measure context switching time?
- HACK Machines and its assembler
- Peak FLOPs per cycle for ARM11 and Cortex-A7 cores in Raspberry Pi 1 and 2
- Computer Architecture/Assembly, Amdahl's Law
- How the heap and stack size is decided in process image
- How can I get the virtual address of a shared library by the use of computer architecture state?
Related Questions in MICROPROCESSORS
- emu8086 doesn't recognize my labels
- Playing .wav files on DOSBox's Sound Blaster device
- Get the Population Standard Deviation of streaming input data
- Are Memory Address Hardcoded on RAM Chip?
- Memory capacity of a RAM
- how to generate software interrupt by some method other than using assembly instruction
- What is the difference between processor (CPU) and microprocessor?
- An 8255 IC is interfaced to 8086 microprocessor
- How to find the physical address of interrupts in interrupt vector table?
- Why is x86 little endian?
- How do screenshots work from a software perspective
- How to write ARMGNU assembly code to write C = A + B?
- How can we expect a program to complete in order?
- How to converting 8085 code to z80 assembly
- How interpreted language code is executed by CPU?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Right, of course you can't just run instructions in arbitrary order, on whatever register values happen to be present in architectural registers. An instruction can only execute when its correct inputs are ready. Finding when that's true for as many "future" instructions as possible is key to keeping execution units fed with work, finding instruction level parallelism (ILP).
Dependency tracking between instructions is set up during allocate/rename as instructions are copied (in program order) from the front-end to the back end. During alloc/rename, the RAT (Register Allocation Table) maps architectural registers to physical registers, allowing multiple independent dep chains using the same architectural register to be executing at once.
Maybe you're thinking about this backwards. An ISA (such as x86-64 or ARMv8) defines a set of rules for how machine instructions execute, often a serial model where one instruction fully finishes executing before the next one starts. (ISAs with some explicit parallelism do exist, e.g. VLIW, or The Mill's delayed-visibility loads that become visible some fixed number of instructions later, helping hide load latency on in-order CPUs. There are still well-defined rules.)
To run software for an ISA like that, hardware has to implement all the rules that are guaranteed on paper. The cardinal rule of out-of-order execution is to preserve the illusion of executing instructions in program order for a single thread1. This is a necessary part of being a valid / non-buggy x86 or ARM CPU.
(See Observing x86 register dependencies - any architectural state that isn't renamed generally needs to serialize the pipeline on modification, so make sure any instructions that need the old value have already finished.)
This is very much like the C "as if" optimization rule: you can do whatever you want as long as the observable result is still the same, but for CPU architects instead of compilers.
As Modern Microprocessors A 90-Minute Guide! puts it: If the processor is going to execute instructions out of order, it will need to keep in mind the dependencies between those instructions. (You should definitely go read that, if you haven't already; it's a very good baseline of understanding for any more advanced stuff, and it covers a lot of ground itself. Highly recommended.)
That's why OoO exec needs so many transistors for a large scheduler and reorder buffer (ROB) to keep track of the reordering, and of dependencies between instructions. (In computer-architecture terminology, RAW hazards: read after write, where one instruction writes a register and a later instruction reads that register. Register renaming hides WAR and WAW hazards.)
Footnote 1: OoO exec does not try to maintain order of memory accesses observed by other threads. Even in-order CPUs can do memory reordering, so software always needs to take care for inter-thread communication, using fence instructions and/or acquire loads / release stores or whatever, depending on the ISA.
See also