I am trying to understand where a Stream might help me with processing multiple Regions of Interest on a video frame. If using NPP functions that support a stream, is this a case where one would launch as many streams as there are ROIs? Possibly even creating a CPU thread for each Stream? Or is the benefit in using one stream to process all the ROIs and possibly using this single stream from multiple threads in the CPU?
Advantage of using a CUDA Stream
8k Views Asked by AeroClassics At
1
There are 1 best solutions below
Related Questions in PARALLEL-PROCESSING
- Async vs Horizontal scaling
- Scattered indices in MPI
- How to perform parallel processes for different groups in a folder?
- Julia parallel programming - Making existing function available to all workers
- Running scala futures somewhat in parallel
- running a thread in parallel
- How to make DGEMM execute sequentially instead of in parallel in Matlab Mex Function
- Running time foreach package
- How to parallelize csh script with nested loop
- SSIS ETL parallel extraction from a AS400 file
- Fill an array with spmd in Matlab
- Distribute lines of code to workers
- Java 8 parallelStream for concurrent Database / REST call
- OutOfRangeException with Parallel.For
- R Nested Foreach Parallelization not Working
Related Questions in CUDA
- direct global memory access using cuda
- Threads syncronization in CUDA
- Merge sort using CUDA: efficient implementation for small input arrays
- why cuda kernel function costs cpu?
- How to detect NVIDIA CUDA Architecture
- What is the optimal way to use additional data fields in functors in Thrust?
- cuda-memcheck fails to detect memory leak in an R package
- Understanding Dynamic Parallelism in CUDA
- C/CUDA: Only every fourth element in CudaArray can be indexed
- NVCC Cuda 5.0 on Ubuntu 12.04 /usr/lib/libudt.so file format not recognized
- Reduce by key on device array
- Does CUDA include a real c++ library?
- cuMemcpyDtoH yields CUDA_ERROR_INVALID_VALUE
- Different Kernels sharing SMx
- How many parallel threads i can run on my nvidia graphic card in cuda programming?
Related Questions in EMGUCV
- Emgu cv invoke exception c#
- EmguCV: Draw contour on object in Motion using Optical Flow?
- How to call OpenCV's MatchTemplate method from C#
- Passing a class as IntPtr
- How to capture image from four cameras when you have only three usb ports in the laptop using C# and Emgucv?
- How to detect black Bullets in Image?
- Using Seq in emgu c#
- How to capture video stream using emgu cv
- How do EmguCV detectors relate to EmguCV DescriptorExtractors?
- Emgu vb.net cvCalcCovarMatrix - covariance matrix empty
- Creating a chessboard mask to check the status of the squares on a chessboard
- Find corners from contour and warp perspective
- get the score from a multi-class SVM classifier using emgu cv and c#
- Emgu CV, unable to add Image to Matrix for KNN training
- emgucv performance issue with video playback
Related Questions in OPENCV3.1
- How can I know tracking is lost using KCF tracker
- I installed opencv3 with conda, yet I can only import cv2... is this correct?
- OpenCV Error: Assertion failed (L.channels() == 1 && I.channels() == 1) in connectedComponents_sub1
- Correcting for concentricity after perspective correction
- rror: ‘class cv::ml::TrainData’ has no member named ‘getTestSamples’ Mat vdata = tdata->getTestSamples();
- OpenCV3 installation on Mac
- Find continuous area without narrow bottlenecks using OpenCV
- imshow() in OpenCV doesn't work for cv::Mat type CV_32F?
- Cmake make install fails with INSTALL cannot find opencv_annotation
- Kinect v2 with Openni 2 and show with OpenCV 3.1.0
- Is surf descriptor extractor still available in opencv3.1?
- OpenCV 3.1 UMat assignment
- OpenCV 3.1: Train dataset for temp stage can not be filled. on 1-Stage
- Setting up OpenCV 3.1 in Visual Studio 2015
- Unspecified error (The function is not implemented.)
Related Questions in MANAGED-CUDA
- Update a D3D9 texture from CUDA
- Copy a static array to host in managedCUDA
- x64 vs x86 for CUDA
- Using CuRand in ManagedCuda
- How to spawn process C++ from C#?
- Can I initialize string[] or list<string> in managedCuda?
- Advantage of using a CUDA Stream
- Is it normal for complex array fft-ifft pair radically change values on each iteration?
- Looping over data in CUDA kernel causes app to abort
- C# Retrieve Cuda Version
- Summing up elements in array using managedCuda
- ManagedCUDA: Pass struct parameter to kernel
- Will there be an update to ManagedCuda for version 9.0 libraries?
- ManagedCUDA : Object Contain non-primitve/non-blitable
- CUDA compile multiple .cu files to one file
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
In CUDA, usage of streams generally helps to better utilize GPU in two ways. Firstly, memory copies between host and device can be overlapped by kernel execution if copying and execution occur in different streams. Secondly, individual kernels running in different streams can overlap if there are enough resources on the GPU.
Further, whether creating a thread for each ROI would help depends on comparison of GPU vs CPU (if any) utilization. If there is a lot of processing on CPU and CPU holds off GPU computation, creating more threads helps.
There are further details (see the documentation for actual version of CUDA) which constrain overlapping of operations in the streams. A memory copy overlaps with a kernel execution only if memory source or destination in RAM is page-locked. Or, synchronization between streams occurs when host thread issues command(s) in the default stream. (Since CUDA 7 each thread has its own default stream, so processing ROIs in different threads would help again.)
Hence, satisfying certain conditions, it should improve performance of your algorithm if the processing of ROIs occurs in different streams up to certain limit (depending on resource consumption of the kernels, ratio of memory copies and computation, etc...)