Do you know any way to add with saturation 32-bit signed words using MMX/SSE assembler instructions? I can find 8/16 bits versions but no 32-bit ones.
Add 32-bit words with saturation
2.9k Views Asked by LooPer At
2
There are 2 best solutions below
0
Michiel
On
Saturated unsigned subtraction is easy, because for `a -= b', we can do
asm (
"pmaxud %1, %0\n\t" // a = max (a,b)
"psubd %1, %0" // a -= b
: "+x" (a)
: "xm" (b)
);
with SSE.
I was looking for unsigned addition, but possibly, the only way is to transform to a saturated unsigned subtraction, perform it, and transform back. Same for signed variants.
EDIT: with unsigned addition, you get min (a, ~b) + b this way, which of course works. With signed addition and subtraction, you have two saturation boundaries, which makes things complicated.
Related Questions in ASSEMBLY
- Is there some way to use printf to print a horizontal list of decrementing hex digits in NASM assembly on Linux
- How to call a C language function from x86 assembly code?
- Binary Bomb Phase 2 - Decoding Assembly
- AVR Assembly Clock Cycle
- Understanding the differences between mov and lea instructions in x86 assembly
- ARM Assembly code is not executing in Vitis IDE
- Which version of ARM does the M1 chip run on?
- Why would %rbp not be equal to the value of %rsp, which is 0x28?
- Move immediate 8-bit value into RSI, RDI, RSP or RBP
- Unable to run get .exe file from assembly NASM
- DOSbox automatically freezes and crashes without any prompt warnings
- Load function written in amd64 assembly into memory and call it
- link.exe unresolved external symbol _mainCRTStartup
- x86 Wrote a boot loader that prints a message to the screen but the characters are completely different to what I expected
- running an imf file using dosbox in parallel to a game
Related Questions in X86
- How to call a C language function from x86 assembly code?
- the difference between two style of inline ASM
- Understanding the differences between mov and lea instructions in x86 assembly
- ARM Assembly code is not executing in Vitis IDE
- x86 - compare numbers and push the result onto the stack
- Seeking for the the method for adding the DL (data register) value to DX register
- link.exe unresolved external symbol _mainCRTStartup
- x86 Wrote a boot loader that prints a message to the screen but the characters are completely different to what I expected
- How does CPU tell between MMIO(Memory Mapped IO) and normal memory access in x86 architecture
- Why do register arg values need to be re-assigned in NASM after an int 0x80 system call?
- Why does LLVM-MCA measure an execution stall?
- Why does shr eax, 32 not do anything?
- Evaluating this in Assembly (A % B) % (C % D)
- Understanding throughput of simd sum implementation x86
- Making portable execution errors
Related Questions in SSE
- Vector by Scalar Division with -ffast-math
- SIMD method to get all consecutive sums of 4 or 8 DWORD integers (prefix-sum within each vector)
- Can std::replace implementation make redundant writes to the passed array?
- How does MSVC avoid mixing SSE and AVX?
- "Simple" Vector SIMD operations in Assembly ( v1 + v2 -> v3 ) called from C#
- Grayscale filter in assembly doesn't work on smaller images
- Parsing integers from string using SIMD
- Why is it quicker to calculate the reciprocal square root than to compute the square root?
- `_mm_pow_ps `and similar functions are not recognized
- Intel xmm registers do not load and multiply correctly
- Are there several same-effect instructions in SSE/AVX?
- SSE Instruction to load Bytes with Zero Extension?
- Unexpected Output While std::cout float32 data twice, which previously swapped by _mm_shuffle_pi16
- x86 Intrinsic : FIR for complex float input
- How to vectorize a vector-matrix product with SSE?
Related Questions in MMX
- Program in C++ that calculates the sum of unsigned char array of 80 elements using MMX instructions through inline assembly programming
- Unexpected Output While std::cout float32 data twice, which previously swapped by _mm_shuffle_pi16
- VHDL: Designing an arithmetic unit with MMX x86 instructions for operand sizes from 64 to 8 bits
- From an fxsave dump, how to determine whether in x87 or MMX mode?
- clang: MMX intrinsics break long double
- What is the Default addition Operator '+' of __m64
- How can I convert C++ code to assembly using the SSE instruction set?
- how to load array elements in MMX or SSE registers to do sum operation on them
- In JWASM/MASM - pshufw produces Error A2030: Instruction or register not accepted in current CPU mode
- Invalid instruction operand when using punpcklwd with MMWORD PTR 64-bit memory operand
- How to prepare data for use with MMX/SSE intrinsics for shifting 16bit values?
- What instruction set does SFENCE belong to?
- What are the names and meanings of the intrinsic vector element types, like epi64x or pi32?
- Stuck at summing two arrays using MMX instructions using NASM
- why does GDB not tab-complete mmx register name(mm0-mm7)
Related Questions in SATURATION-ARITHMETIC
- Saturate 16-bit signed integer to 12-bits signed
- Add two vectors (uint64_t type) with saturation for each int8_t element
- Multiply by 2 with signed saturation in 6 operations in C?
- cuda SIMD instruction for per-byte multiplication with unsigned saturation
- Do you know any saturation function? To make number fit to given range?
- ColorMatrix Saturation and OpenCV Saturation result are different
- Cast type with range limit
- Using CSS filter to increase Saturation on webpage and Streaming video container
- What is the most efficient way to handle integer multiplication overflow with saturation with ARM Neon intrinsics?
- GCC complier :strange behavior when doing float operation , float value saturating to 65536 where float is of 4 bytes
- Fast saturating integer conversion?
- How to handle addition and subtraction beyond Integers MAX_VALUE and MIN_VALUE?
- In CSS, is filter: saturate(100) any slower than saturate(2)?
- How to solve a saturation problem with AWS EC2 t2.micro
- OpenGL handling float color saturation ("color overflow")?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
You can emulate saturated signed adds by performing the following steps:
Unsigned, it's even simpler, see this stackoverflow posting
In SSE2, the above maps to a sequence of parallel compares and AND/ANDN operations. No single operation is available in hardware, unfortunately.