In this thread x64 allows less threads per block than Win32? there was a questions about running out of registers. I was under the impression the Nvidia has dropped support for x86 in CUDA 7.5 and beyond. This may be a foolish question but does that mean that all pointers are going to require two registers going forward? And that potentially less threads/block will be the way things work going forward?
1
There are 1 best solutions below
Related Questions in C++
- C++ using std::vector across boundaries
- Linked list without struct
- Connecting Signal QML to C++ (Qt5)
- how to get the reference of struct soap inherited in C++ Proxy/Service class
- Why we can't assign value to pointer
- Conversion of objects in c++
- shared_ptr: "is not a type" error
- C++ template using pointer and non pointer arguments in a QVector
- C++ SFML 2.2 vectors
- Lifetime of temporary objects
- I want to be able to use 4 different variables in a select statement in c ++
- segmentation fault: 11, extracting data in vector
- How to catch delay-import dll errors (missing dll or symbol) in MinGW(-w64)?
- How can I print all the values in this linked list inside a hash table?
- Configured TTL for A record(s) backing CNAME records
Related Questions in CUDA
- direct global memory access using cuda
- Threads syncronization in CUDA
- Merge sort using CUDA: efficient implementation for small input arrays
- why cuda kernel function costs cpu?
- How to detect NVIDIA CUDA Architecture
- What is the optimal way to use additional data fields in functors in Thrust?
- cuda-memcheck fails to detect memory leak in an R package
- Understanding Dynamic Parallelism in CUDA
- C/CUDA: Only every fourth element in CudaArray can be indexed
- NVCC Cuda 5.0 on Ubuntu 12.04 /usr/lib/libudt.so file format not recognized
- Reduce by key on device array
- Does CUDA include a real c++ library?
- cuMemcpyDtoH yields CUDA_ERROR_INVALID_VALUE
- Different Kernels sharing SMx
- How many parallel threads i can run on my nvidia graphic card in cuda programming?
Related Questions in EMGUCV
- Emgu cv invoke exception c#
- EmguCV: Draw contour on object in Motion using Optical Flow?
- How to call OpenCV's MatchTemplate method from C#
- Passing a class as IntPtr
- How to capture image from four cameras when you have only three usb ports in the laptop using C# and Emgucv?
- How to detect black Bullets in Image?
- Using Seq in emgu c#
- How to capture video stream using emgu cv
- How do EmguCV detectors relate to EmguCV DescriptorExtractors?
- Emgu vb.net cvCalcCovarMatrix - covariance matrix empty
- Creating a chessboard mask to check the status of the squares on a chessboard
- Find corners from contour and warp perspective
- get the score from a multi-class SVM classifier using emgu cv and c#
- Emgu CV, unable to add Image to Matrix for KNN training
- emgucv performance issue with video playback
Related Questions in MANAGED-CUDA
- Update a D3D9 texture from CUDA
- Copy a static array to host in managedCUDA
- x64 vs x86 for CUDA
- Using CuRand in ManagedCuda
- How to spawn process C++ from C#?
- Can I initialize string[] or list<string> in managedCuda?
- Advantage of using a CUDA Stream
- Is it normal for complex array fft-ifft pair radically change values on each iteration?
- Looping over data in CUDA kernel causes app to abort
- C# Retrieve Cuda Version
- Summing up elements in array using managedCuda
- ManagedCUDA: Pass struct parameter to kernel
- Will there be an update to ManagedCuda for version 9.0 libraries?
- ManagedCUDA : Object Contain non-primitve/non-blitable
- CUDA compile multiple .cu files to one file
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Yes. All pointers in x64 mode will require 2 (32-bit) registers for storage.
Certainly there should be no impact on the number of blocks that can be launched. Regarding threads, yes, there is potentially an impact on threads per block (since the product of threads per block launched times registers per thread must be lower than the machine limit), but as I stated in my answer to the question you linked, the limitation on threads can usually be worked around using one of several methods as mentioned there. Many kernels will not be impacted, because they are not "up against the limit". For those kernels that are "up against the limit", there are well established techniques to mitigate the effect and allow you to run the desired number of threads per block, up to 1024.
Ultimately this means the issue presented is not one of capability so much as it is one of performance optimization, which issue will always be present.