I'm writing some SSE/AVX code and there's a task to divide a packed signed 32 bit integers by 2's complement. When the values are positive this shift works fine, however it produces wrong results for negative values, because of shifting the sign bit.
Is there any SIMD operation that lets me shift preserving the position of the sign bit? Thanks
Using SIMD to right shift 32 bit packed negative number
878 Views Asked by Isso At
1
There are 1 best solutions below
Related Questions in C
- Passing arguments to main in C using Eclipse
- kernel module does not print packet info
- error C2016 (C requires that a struct or union has at least one member) and structs typedefs
- Drawing with ncurses, sockets and fork
- How to catch delay-import dll errors (missing dll or symbol) in MinGW(-w64)?
- Configured TTL for A record(s) backing CNAME records
- Allocating memory for pointers inside structures in functions
- Finding articulation point of undirected graph by DFS
- C first fgets() is being skipped while the second runs
- C std library don't appear to be linked in object file
- gcc static library compilation
- How to do a case-insensitive string comparison?
- C programming: Create and write 2D array of files as function
- How to read a file then store to array and then print?
- Function timeouts in C and thread
Related Questions in SSE
- How to add values from vector to each other
- Effective way to extract from SSE vector on AMD processors
- Assembly x64: Using MULPD instruction with integer
- Check whether __m128i is zero?
- Compare two 16-byte values for equality using up to SSE 4.2?
- assembly function with C segfault
- Tell C++ that pointer data is 16 byte aligned
- OpenCV FAST corner detection SSE implementation walkthrough
- Minimum and maximum of signed zero
- GCC emits vastly different code using "-march=native" on similar architectures
- 32-bit Hamming String formation from 32 8-bit comparisons
- Multiply-subtract in SSE
- 0xFFFF flags in SSE
- Is vectorization profitable in this case?
- How to split an XMM 128-bit register into two 64-bit integer registers?
Related Questions in AVX
- Check whether __m128i is zero?
- Compare two 16-byte values for equality using up to SSE 4.2?
- For some reason serial code runs faster than SIMD code
- SSE - AVX conversion from double to char
- GCC emits vastly different code using "-march=native" on similar architectures
- Wrapper for `__m256` Producing Segmentation Fault with Constructor - Windows 64 + MinGW + AVX Issues
- 32-bit Hamming String formation from 32 8-bit comparisons
- MinGW64 Is Incapable of 32 Byte Stack Alignment (Required for AVX on Windows x64), Easy Work Around or Switch Compilers?
- Largest data type which can be fetch-ANDed atomically?
- (Vec4 x Mat4x4) product using SIMD and improvements
- Need for fast data demuxing in C# by using multi-threading, AVX, GPU or whatever
- What are some rules of thumb for when SIMD would be faster? (SSE2, AVX)
- How can I convert a vector of float to short int using avx instructions?
- AVX support for remainder in G++ 5.4.0
- How to efficiently perform double/int64 conversions with SSE/AVX?
Related Questions in MMX
- How to use MMX code in c# for image processing
- How to add each byte of an 8-byte long integer?
- Unable to activate the SSE instruction set by "-march=native" in gcc or any other flags in Core2 chip
- How to convert 'long long' (or __int64) to __m64
- From an fxsave dump, how to determine whether in x87 or MMX mode?
- VHDL: Designing an arithmetic unit with MMX x86 instructions for operand sizes from 64 to 8 bits
- How to prepare data for use with MMX/SSE intrinsics for shifting 16bit values?
- Image Processing with MMX in Linux
- MMX error A2022:instruction operands must be the same size
- How to use MMX in parallel with SSE operations
- MMX - working with constant bytes
- How to save a value to a variable using mmx ? (c++)
- How to add all the elements of an array using MMX?
- -g flag changes runtime and compilation of program
- Invalid instruction operand when using punpcklwd with MMWORD PTR 64-bit memory operand
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
SSE2/AVX2 has a choice of arithmetic1 vs. logical right shifts for 16 and 32-bit element sizes. (For 64-bit elements, only logical is available until AVX512).
Use
_mm_srai_epi32(psrad) instead of_mm_srli_epi32(psrld).See Intel's intrinsics guide, and other links in the SSE tag wiki https://stackoverflow.com/tags/sse/info. (Filter it to exclude AVX512 if you want, because it's pretty cluttered these days with all the masked versions for all 3 sizes...)
Or just look at the asm instruction-set reference, which includes intrinsics for instructions that have them. Searching for "arithmetic" in http://felixcloutier.com/x86/index.html finds the shifts you want.
Note the
a=arithmetic vs.l=logical, instead of the usual intrinsics naming scheme ofepu32for unsigned. The asm mnemonics are simple and consistent (e.g. Packed Shift Right Arithmetic Dword =psrad).Arithmetic right shifts are also available for AVX2 variable-shifts (
vpsravd, and for the one-variable-for-all-elements version of the immediate shifts.Footnote 1:
Arithmetic right shifts shift in copies of the sign bit, instead of zero.
This correctly implement 2's complement signed division by powers of 2 with rounding towards negative infinity, unlike the truncation toward zero you get from C signed division. Look at the asm output for
int foo(int a){return a/4;}to see how compilers implement signed division semantics in terms of shifts.