I'm writing some SSE/AVX code and there's a task to divide a packed signed 32 bit integers by 2's complement. When the values are positive this shift works fine, however it produces wrong results for negative values, because of shifting the sign bit.
Is there any SIMD operation that lets me shift preserving the position of the sign bit? Thanks
Using SIMD to right shift 32 bit packed negative number
878 Views Asked by Isso At
1
There are 1 best solutions below
Related Questions in C
- How to call a C language function from x86 assembly code?
- What does: "char *argv[]" mean?
- User input sanitization program, which takes a specific amount of arguments and passes the execution to a bash script
- How to crop a BMP image in half using C
- How can I get the difference in minutes between two dates and hours?
- Why will this code compile although it defines two variables with the same name?
- Compiling eBPF program in Docker fails due to missing '__u64' type
- Why can't I use the file pointer after the first read attempt fails?
- #include Header files in C with definition too
- OpenCV2 on CLion
- What is causing the store latency in this program?
- How to refer to the filepath of test data in test sourcecode?
- 9 Digit Addresses in Hexadecimal System in MacOS
- My server TCP doesn't receive messages from the client in C
- Printing the characters obtained from the array s using printf?
Related Questions in SSE
- Vector by Scalar Division with -ffast-math
- SIMD method to get all consecutive sums of 4 or 8 DWORD integers (prefix-sum within each vector)
- Can std::replace implementation make redundant writes to the passed array?
- How does MSVC avoid mixing SSE and AVX?
- "Simple" Vector SIMD operations in Assembly ( v1 + v2 -> v3 ) called from C#
- Grayscale filter in assembly doesn't work on smaller images
- Parsing integers from string using SIMD
- Why is it quicker to calculate the reciprocal square root than to compute the square root?
- `_mm_pow_ps `and similar functions are not recognized
- Intel xmm registers do not load and multiply correctly
- Are there several same-effect instructions in SSE/AVX?
- SSE Instruction to load Bytes with Zero Extension?
- Unexpected Output While std::cout float32 data twice, which previously swapped by _mm_shuffle_pi16
- x86 Intrinsic : FIR for complex float input
- How to vectorize a vector-matrix product with SSE?
Related Questions in AVX
- Avx2 intrinsics don't use all registers available. .NET 8
- In a Linux signal handler, will x86 extended state always be in XSAVE format, or can it be in XSAVEC format as well?
- SIMD method to get all consecutive sums of 4 or 8 DWORD integers (prefix-sum within each vector)
- avoid memory errors with AVX intinsics
- AVX intrinsic and matrix multiplication with c language
- AVX2 vectorization for code similar to prefix sum (decrement by count of preceding matches in short fixed-length arrays)
- Can std::replace implementation make redundant writes to the passed array?
- How does MSVC avoid mixing SSE and AVX?
- Run AVX SIMD instruction in VScode on Windows with a WSL
- Parsing integers from string using SIMD
- Is there an ARM Neon Gather Instruction?
- Is it better to assign all the members of an array and then add another array, or to assign each member and immediately add?
- `_mm_pow_ps `and similar functions are not recognized
- Are there several same-effect instructions in SSE/AVX?
- Leveraging and optimizing SIMD for matrix axis looping in cython
Related Questions in MMX
- Program in C++ that calculates the sum of unsigned char array of 80 elements using MMX instructions through inline assembly programming
- Unexpected Output While std::cout float32 data twice, which previously swapped by _mm_shuffle_pi16
- VHDL: Designing an arithmetic unit with MMX x86 instructions for operand sizes from 64 to 8 bits
- From an fxsave dump, how to determine whether in x87 or MMX mode?
- clang: MMX intrinsics break long double
- What is the Default addition Operator '+' of __m64
- How can I convert C++ code to assembly using the SSE instruction set?
- how to load array elements in MMX or SSE registers to do sum operation on them
- In JWASM/MASM - pshufw produces Error A2030: Instruction or register not accepted in current CPU mode
- Invalid instruction operand when using punpcklwd with MMWORD PTR 64-bit memory operand
- How to prepare data for use with MMX/SSE intrinsics for shifting 16bit values?
- What instruction set does SFENCE belong to?
- What are the names and meanings of the intrinsic vector element types, like epi64x or pi32?
- Stuck at summing two arrays using MMX instructions using NASM
- why does GDB not tab-complete mmx register name(mm0-mm7)
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
SSE2/AVX2 has a choice of arithmetic1 vs. logical right shifts for 16 and 32-bit element sizes. (For 64-bit elements, only logical is available until AVX512).
Use
_mm_srai_epi32(psrad) instead of_mm_srli_epi32(psrld).See Intel's intrinsics guide, and other links in the SSE tag wiki https://stackoverflow.com/tags/sse/info. (Filter it to exclude AVX512 if you want, because it's pretty cluttered these days with all the masked versions for all 3 sizes...)
Or just look at the asm instruction-set reference, which includes intrinsics for instructions that have them. Searching for "arithmetic" in http://felixcloutier.com/x86/index.html finds the shifts you want.
Note the
a=arithmetic vs.l=logical, instead of the usual intrinsics naming scheme ofepu32for unsigned. The asm mnemonics are simple and consistent (e.g. Packed Shift Right Arithmetic Dword =psrad).Arithmetic right shifts are also available for AVX2 variable-shifts (
vpsravd, and for the one-variable-for-all-elements version of the immediate shifts.Footnote 1:
Arithmetic right shifts shift in copies of the sign bit, instead of zero.
This correctly implement 2's complement signed division by powers of 2 with rounding towards negative infinity, unlike the truncation toward zero you get from C signed division. Look at the asm output for
int foo(int a){return a/4;}to see how compilers implement signed division semantics in terms of shifts.