Usually one compute unit can only run one work group. But AMD's doc says there can be more than one wavefronts running on the same compute unit. How can I do that? Is that an OpenCL function for that? Or I need to use assembly instruction? I want to do this because my work group size is 20 and I want to run 2 work groups per compute unit, so that each group can use 32 KiB LDS (64 KiB total per CU, each wavefront can use up to 32KiB so I want to run two wavefronts to use the full amount of LDS).
How to run two work groups per one compute unit on AMD GCN cards
165 Views Asked by user1200759 At
0
There are 0 best solutions below
Related Questions in GPU
- Get GPU temperature in Android
- Can I use Julia to program my GPU & CPU?
- C: Usage of any GPU for parallel calculations
- Can I run Cuda or OpenCl on Intel processor graphics I7 (3rd or 4rd generation)
- How to get fragment coordinate in fragment shader in Metal?
- Is prefix scan CUDA sample code in gpugems3 correct?
- How many threads/work-items are used?
- When do we need two dimension threads in CUDA?
- What does a GPU kernel overhead consist of?
- Efficiently Generate a Heat Map Style Histogram using GLSL
- installing gputools on windows
- Make a dependent loop independent
- Is it possible to execute multiple instances of a CUDA program on a multi-GPU machine?
- CUDA cuBlasGetmatrix / cublasSetMatrix fails | Explanation of arguments
- Missing functions vload and vstore: OpenCL on Android
Related Questions in OPENCL
- Disable OpenCL in OpenCV completely
- opencl duplicate memory object on device
- Can I use Julia to program my GPU & CPU?
- openCL CL_OUT_OF_RESOURCES Error
- Debugging OpenCL with Intel SDK for visual studio dont stop at breakpoints
- NetBeans gives segfault, running the prgram using terminal does not
- opencl local memory and workgroup size
- Visual Studio 2013, Intel INDE 2015 update 2, Platform IDS change while debug
- Can I run Cuda or OpenCl on Intel processor graphics I7 (3rd or 4rd generation)
- How much, if any, does the choice of host language affect OpenCL performance?
- Row and Column-Major in opencl and pyopencl
- ClEnqueueCopyBuffer with offset 1
- VexCL vector of structs?
- How many threads/work-items are used?
- Kernel file not opening in XCode: C++ openCL code
Related Questions in AMD-GPU
- OpenCL device information collection
- Triggering L2 cache write to global memory on AMD GCN architecture using OpenCL
- Application triggering an AMD discrete card on a Windows laptop
- Why does OpenCL crash only for Nvidia card?
- OpenGL render difference between nVidia and ATI
- ROCm Installation with Fedora 32
- is it possible to access HBM2 in parallel?
- How to use tensorflow v2 with directml backend
- libc6-dev/libc-dev : "Unable to fix problems, bad packets are in “keep as is” mode."
- [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=1552686, emitted seq=1552688
- How to compile clang llvm to amd gcn on linux ubuntu
- Blender and other 3D applications don't launch
- Install OpenCL(AMD SDK kit) on linux without ROOT privilege
- ffmpeg h264_amf error: CreateComponent(AMFVideoEncoderVCE_AVC) failed with error 10
- how do you install drivers for AMD radeon PRO WX 4100 on debian 10?
Related Questions in AMD-GCN
- Triggering L2 cache write to global memory on AMD GCN architecture using OpenCL
- Avoid L1 cache pollution on GCN device
- Do optimized kernels running on AMD GCN OpenCL only work with ~1024 bytes at a time?
- OpenCL (AMD GCN) global memory access pattern for vectorized data: strided vs. contiguous
- Performance drop in matrix multiplication for certain sizes on AMD Polaris
- SIMD-16 and SIMD-32 advantage/disadvantage?
- What is the best practice for memory access in this N-body problem solved on AMD Radeon RX580?
- V_SUB_F64 in AMD's GCN and VEGA instruction set
- OpenCL and AMD GPU Architecture understanding
- How to resolve _pickle.UnpicklingError
- How to run two work groups per one compute unit on AMD GCN cards
- How to compile .cl file that contains inline assembly for GCN cards?
- Is uint2 operations faster than ulong in OpenCL on AMD GCN cards?
- How to read and write to Global Data Share in AMD GCN?
- In OpenCL, can one take an array containing GCN Assembly and execute it (JIT)?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?