Which of the "+" calculation is faster? 1) uint2 a, b, c; c = a + b; 2) ulong a, b, c; c = a + b;
Is uint2 operations faster than ulong in OpenCL on AMD GCN cards?
375 Views Asked by user1200759 At
1
There are 1 best solutions below
Related Questions in OPENCL
- Disable OpenCL in OpenCV completely
- opencl duplicate memory object on device
- Can I use Julia to program my GPU & CPU?
- openCL CL_OUT_OF_RESOURCES Error
- Debugging OpenCL with Intel SDK for visual studio dont stop at breakpoints
- NetBeans gives segfault, running the prgram using terminal does not
- opencl local memory and workgroup size
- Visual Studio 2013, Intel INDE 2015 update 2, Platform IDS change while debug
- Can I run Cuda or OpenCl on Intel processor graphics I7 (3rd or 4rd generation)
- How much, if any, does the choice of host language affect OpenCL performance?
- Row and Column-Major in opencl and pyopencl
- ClEnqueueCopyBuffer with offset 1
- VexCL vector of structs?
- How many threads/work-items are used?
- Kernel file not opening in XCode: C++ openCL code
Related Questions in AMD-GPU
- OpenCL device information collection
- Triggering L2 cache write to global memory on AMD GCN architecture using OpenCL
- Application triggering an AMD discrete card on a Windows laptop
- Why does OpenCL crash only for Nvidia card?
- OpenGL render difference between nVidia and ATI
- ROCm Installation with Fedora 32
- is it possible to access HBM2 in parallel?
- How to use tensorflow v2 with directml backend
- libc6-dev/libc-dev : "Unable to fix problems, bad packets are in “keep as is” mode."
- [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=1552686, emitted seq=1552688
- How to compile clang llvm to amd gcn on linux ubuntu
- Blender and other 3D applications don't launch
- Install OpenCL(AMD SDK kit) on linux without ROOT privilege
- ffmpeg h264_amf error: CreateComponent(AMFVideoEncoderVCE_AVC) failed with error 10
- how do you install drivers for AMD radeon PRO WX 4100 on debian 10?
Related Questions in AMD-GCN
- Triggering L2 cache write to global memory on AMD GCN architecture using OpenCL
- Avoid L1 cache pollution on GCN device
- Do optimized kernels running on AMD GCN OpenCL only work with ~1024 bytes at a time?
- OpenCL (AMD GCN) global memory access pattern for vectorized data: strided vs. contiguous
- Performance drop in matrix multiplication for certain sizes on AMD Polaris
- SIMD-16 and SIMD-32 advantage/disadvantage?
- What is the best practice for memory access in this N-body problem solved on AMD Radeon RX580?
- V_SUB_F64 in AMD's GCN and VEGA instruction set
- OpenCL and AMD GPU Architecture understanding
- How to resolve _pickle.UnpicklingError
- How to run two work groups per one compute unit on AMD GCN cards
- How to compile .cl file that contains inline assembly for GCN cards?
- Is uint2 operations faster than ulong in OpenCL on AMD GCN cards?
- How to read and write to Global Data Share in AMD GCN?
- In OpenCL, can one take an array containing GCN Assembly and execute it (JIT)?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
AMD GCN has no native 64-bit integer vector support, so the second statement would be translated into two 32-bit adds, one V_ADD_U32 followed by a V_ADDC_U32 which takes the carry flag from the first V_ADD_U32 into account.
So to answer your question they are both the same in terms of instruction count, however the first can be computed in parallel (instruction level parallelism) and could be faster IF your kernel is occupancy bound (ie. using lots of registers).
If your statements can be executed by the scalar unit (ie. they do not depend on the thread index) then the game changes and the second one will be just one instruction (vs. two) since the scalar unit has native 64-bit integer support.
However keep in mind your first statement is not the same as the second, you would lose the carry flag.