In this thread x64 allows less threads per block than Win32? there was a questions about running out of registers. I was under the impression the Nvidia has dropped support for x86 in CUDA 7.5 and beyond. This may be a foolish question but does that mean that all pointers are going to require two registers going forward? And that potentially less threads/block will be the way things work going forward?
1
There are 1 best solutions below
Related Questions in C++
- How can I use LD to place ARM reset vectors in a program segment
- Need help linking listview to an ArrayAdapter error: Cannot resolve method'SetListAdapter(android.widget.ArrayAdapter<java.lang.String>)'
- Duplicate Symbols with a static library - understand the duplicate symbol error message
- Linker Script Symbols
- How to disable linker relaxation during linking phase of GCC LD?
- Adding Armadillo to Qt project
- Specify a minimum starting address for text segment
- in Makefiles, how to test for gcc's --with-ld option?
- What is the IAR equiavlent of the gcc linker NOLOAD directive?
- Embed Python2 and Python3 interpreters, choose which one to use at runtime
Related Questions in CUDA
- How can I use LD to place ARM reset vectors in a program segment
- Need help linking listview to an ArrayAdapter error: Cannot resolve method'SetListAdapter(android.widget.ArrayAdapter<java.lang.String>)'
- Duplicate Symbols with a static library - understand the duplicate symbol error message
- Linker Script Symbols
- How to disable linker relaxation during linking phase of GCC LD?
- Adding Armadillo to Qt project
- Specify a minimum starting address for text segment
- in Makefiles, how to test for gcc's --with-ld option?
- What is the IAR equiavlent of the gcc linker NOLOAD directive?
- Embed Python2 and Python3 interpreters, choose which one to use at runtime
Related Questions in EMGUCV
- How can I use LD to place ARM reset vectors in a program segment
- Need help linking listview to an ArrayAdapter error: Cannot resolve method'SetListAdapter(android.widget.ArrayAdapter<java.lang.String>)'
- Duplicate Symbols with a static library - understand the duplicate symbol error message
- Linker Script Symbols
- How to disable linker relaxation during linking phase of GCC LD?
- Adding Armadillo to Qt project
- Specify a minimum starting address for text segment
- in Makefiles, how to test for gcc's --with-ld option?
- What is the IAR equiavlent of the gcc linker NOLOAD directive?
- Embed Python2 and Python3 interpreters, choose which one to use at runtime
Related Questions in MANAGED-CUDA
- How can I use LD to place ARM reset vectors in a program segment
- Need help linking listview to an ArrayAdapter error: Cannot resolve method'SetListAdapter(android.widget.ArrayAdapter<java.lang.String>)'
- Duplicate Symbols with a static library - understand the duplicate symbol error message
- Linker Script Symbols
- How to disable linker relaxation during linking phase of GCC LD?
- Adding Armadillo to Qt project
- Specify a minimum starting address for text segment
- in Makefiles, how to test for gcc's --with-ld option?
- What is the IAR equiavlent of the gcc linker NOLOAD directive?
- Embed Python2 and Python3 interpreters, choose which one to use at runtime
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Yes. All pointers in x64 mode will require 2 (32-bit) registers for storage.
Certainly there should be no impact on the number of blocks that can be launched. Regarding threads, yes, there is potentially an impact on threads per block (since the product of threads per block launched times registers per thread must be lower than the machine limit), but as I stated in my answer to the question you linked, the limitation on threads can usually be worked around using one of several methods as mentioned there. Many kernels will not be impacted, because they are not "up against the limit". For those kernels that are "up against the limit", there are well established techniques to mitigate the effect and allow you to run the desired number of threads per block, up to 1024.
Ultimately this means the issue presented is not one of capability so much as it is one of performance optimization, which issue will always be present.