I don't understand why peephole optimization is needed? Because the compiler is smart enough to optimise the code? Can you please give me some examples where peephole optimization is needed?
Why peephole optimization is done on assembly code but not on IR code?
310 Views Asked by Tauro At
1
There are 1 best solutions below
Related Questions in ASSEMBLY
- Is there some way to use printf to print a horizontal list of decrementing hex digits in NASM assembly on Linux
- How to call a C language function from x86 assembly code?
- Binary Bomb Phase 2 - Decoding Assembly
- AVR Assembly Clock Cycle
- Understanding the differences between mov and lea instructions in x86 assembly
- ARM Assembly code is not executing in Vitis IDE
- Which version of ARM does the M1 chip run on?
- Why would %rbp not be equal to the value of %rsp, which is 0x28?
- Move immediate 8-bit value into RSI, RDI, RSP or RBP
- Unable to run get .exe file from assembly NASM
- DOSbox automatically freezes and crashes without any prompt warnings
- Load function written in amd64 assembly into memory and call it
- link.exe unresolved external symbol _mainCRTStartup
- x86 Wrote a boot loader that prints a message to the screen but the characters are completely different to what I expected
- running an imf file using dosbox in parallel to a game
Related Questions in COMPILER-CONSTRUCTION
- Reference: Crafting Interpreters. Print statement is not able to evaluate expression. Help me fix this (details below)
- Load function written in amd64 assembly into memory and call it
- I have implemented till Statements and State in Tree Walk Interpreter. I am pissed with an error
- Resolve shift/reduction conflict in grammar for expressions in PLY for calls to embedded functions
- Grammar for access to properties and calls to embedded functions
- LLVM code generation causes problems with pointer arithmetic
- what does react compiler mean actually?
- Errors on Recursive Descent Parsing Java
- Java CUP produces Shift-Reduce conflict when parsing a grammar for a C++ type language
- Three-Address-Code (TAC) and Conjunction/Disjunction
- How do I write an implicit cast for my strongly typed interpreter? (C++)
- Yacc parser not reducing specific production rules as intended
- Why is the function version tag consistently "Base" in HDF5 library?
- Sly parser, how are recursively defined types implemented?
- Does a non terminal token need an explicit definition?
Related Questions in COMPILER-OPTIMIZATION
- Optimizing Memory-Bound Loop with Indirect Prefetching
- Avx2 intrinsics don't use all registers available. .NET 8
- Most variables are optimized out, even though -O0 is specified (using cmake and mpicxx/g++)
- Will a Comparator.comparing… in a `compareTo` method be optimized by the compiler in modern Java?
- Usage of __attribute__((aligned(4), packed)) with structures in C
- C pointers and -O3 optimized semantic anomalies
- Why is my rust program producing a Segfault
- Compiling hip code using hipcc -O0 for AMD GPU
- Optimizing Mandelbrot Set Calculation in C++ on a High-Performance CPU
- Code says int j=1; debugger says that j=3 (C++)
- Why do some x64 compilers not inline fmin/fminf?
- How can I check if a code block is optimized away, without looking at the compiled code?
- Why two modular operations cannot be optimized as well as one modular operation
- Are std::views sub-optimal in GCC even in simple cases
- Tree-sitter: "choice" grammar does not work
Related Questions in PEEPHOLE-OPTIMIZATION
- Nicer way to pattern match window of assembly instructions for peephole w/ Rust?
- Why peephole optimization is done on assembly code but not on IR code?
- Difference between Peephole and Peephole 2 in GCC
- What prevents the compiler do a peephole optimization on expression templates?
- peephole optimization patterns
- java peephole optimization beginner compilers
- Reduce assembly number of instructions
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Peepholes are often target-specific.
They may only make sense in terms of target registers (RTL), not IR.
For example e.g. x86
xor eax, eaxinstead ofmov eax,0. (What is the best way to set a register to zero in x86 assembly: xor, mov or and?). There'd be no reason to do this in IR and doing it any earlier than the last moment (final code-generation) would obfuscate the fact that the value is zero for other optimizations. Doing that for any machine except x86 would be an anti-optimization (creating a false dependency). OTOH you don't want to leave it too late, or else you might not be able to reorder it ahead of something that sets FLAGS, e.g.Instead of
or
Or as another example, x86 can multiply by 3, 5, or 9 using LEA to take advantage of the 2-bit shift and add in 2-register addressing-modes. It might be useful for an optimizer to know that this is an efficient building-block, and aim to re-factor things into a multiply by 9, but actually converting a multiply by 10 into
(x * 5) * 2is not how you'd want to do it for targets where(x<<3) + (x<<1)is more efficient (x*10 = x*8 + x*2).See
imulvs. 2xleaand how modern CPUs with fastimulmake it only worth it to spend at most 2 instructions replacing a multiply, or only 1 if the bottleneck is throughput not latency. Unless you can fold an addition into it like LEA can...