Arrays are stored as xyzxyz...
, I want to get the maximum and minimum for some direction(x or y or z), and here is the test program:
#include <cuda_runtime.h>
#include <cuda_runtime_api.h> // cudaMalloc, cudaMemcpy, etc.
#include <cublas_v2.h>
#include <helper_functions.h> // shared functions common to CUDA Samples
#include <helper_cuda.h> // CUDA error checking
#include <stdio.h> // printf
#include <iostream>
template <typename T>
void print_arr(T *arr, int L)
{
for (int i = 0; i < L; i++)
{
std::cout << arr[i] << " ";
}
std::cout << std::endl;
}
int main()
{
float hV[10] = {3, 0, 7, 1, 2, 8, 6, 7, 6, 4};
print_arr(hV, 10);
float *dV;
cudaMalloc(&dV, sizeof(float) * 10);
cudaMemcpy(dV, hV, sizeof(float) * 10, cudaMemcpyHostToDevice);
cublasHandle_t cublasHandle = NULL;
checkCudaErrors(cublasCreate(&cublasHandle));
int hResult[2] = {0};
checkCudaErrors(cublasIsamax(cublasHandle, 10, dV, 3, hResult + 0));
checkCudaErrors(cublasIsamin(cublasHandle, 10, dV, 3, hResult + 1));
print_arr(hResult, 2);
return 0;
}
expected result:
3 0 7 1 2 8 6 7 6 4
3 2
result:
3 0 7 1 2 8 6 7 6 4
3 5
Is there a problem with this result? Or I misunderstood?
link to cublasIsamin
.
cublasIsamin
finds the index of the minimum value. This index is not computed over the original array, but also takes theincx
parameter into account. Furthermore, it will search overn
elements (the first parameter) regardless of other parameters such asincx
.You have an array like this:
Therefore the minimum
x
value is at index 3, searching over a total ofn=4
(not 10) elements. With respect to the x values, we must begin searchingdV
at offset 0 with an increment of 3, for a maximum ofn=4
elements.Taking all this into account, the correct calls are:
And the expected result is: